Friday, July 19, 2013

Hacking AMD OpenGL drivers

Even though the logarithmic depth buffer technique works pretty nicely, it has several problems that make its use problematic in some cases. If you use just the vertex shader modification, you can get depth buffer artifacts on longer triangles that are close to the camera, since the depth values aren't correctly interpolated in perspective. It can be helped by a finer tesselation, or by writing the correct values in the fragment shader (possibly just for the geometry that's not tesselated sufficiently). However, writing the fragment depth in shader disables certain hardware depth buffer optimizations like the early depth test, and adds to the bandwidth. That can pose a problem in scenarios with a higher overdraw.

On Direct3D there's a technique that can provide sufficient depth buffer precision - reverse mapping of far/near planes in a normal floating-point depth buffer. In OpenGL it can't be used directly because of a design flaw that causes a huge loss of precision in depth value computation (not just for the floating point depth buffers, see here). See more detail in maximizing depth buffer range and precision blog post.

There's a way to work around it on Nvidia hardware thanks to the support of an unclamped glDepthRange extension (glDepthRangedNV). However, on AMD it's not supported and there were indications that it may not even be possible. But here's what I found: with a glDepthRange(-1, 1) call that would solve the problem, the arguments are clamped to (0, 1) as per specification. But if we go into the disassembly of the call and make it skip the instruction that would cause it to clamp the lower bound:



... and the reverse FP buffer technique suddenly starts working! With precision good enough to handle the range needed to cover the whole universe. Projection matrix to use with it looks like this:

Mproj = X 0 0 0 0 Y 0 0 0 0 0 near 0 0 1 0

There's no far term; the zero depth value is projected to infinity. The precision is very high - for near=0.01m the precision measured on the GPU is around 0.03mm at 100m, 0.003m at 10km, and 0.3m at 1000km and so on.

Of course, hacking the driver this way for normal use would be highly impractical, it was done just to show that actually nothing prevents AMD from supporting the unclamped depth range and getting a depth buffer technique that works with great precision without sacrificing the depth optimizations.

Hoping they will be listening.

Thursday, July 18, 2013

Logarithmic depth buffer optimizations & fixes


An updated logarithmic depth equation (vertex shader):

    //assuming gl_Position was already computed
    gl_Position.z = log2(max(1e-6, 1.0 + gl_Position.w)) * Fcoef - 1.0;


Where Fcoef is a constant or uniform value computed as Fcoef = 2.0 / log2(farplane + 1.0).


Changes (compared to the initial version):
  • using log2 instead of log: in shaders, log function is implemented using the log2 instruction, so it's better to use log2 directly, avoiding an extra multiply
  • clipping issues: for values smaller than or equal to 0 the log function is undefined. In cases when one vertex of the triangle lies further behind the camera (≤ -1), this causes a rejection of the whole triangle even before the triangle is clipped.
    Clamping the value via max(1e-6, 1.0 + gl_Position.w) solves the problem of disappearing long triangles crossing the camera plane.
  • no need to compute depth in camera space: after multiplying with the modelview projection matrix, gl_Position.w component contains the positive depth into the scene, so the above equation is the only thing that has to be added after your normal modelview projection matrix multiply
  • Previously used "C" constant changing the precision distribution was removed, since the precision is normally much higher than necessary, and C=1 works well

To address the issue of the depth not being interpolated in perspectively-correct way, output the following interpolant from the vertex shader:

    //out float flogz;
    flogz = 1.0 + gl_Position.w;

and then in the fragment shader add:

    gl_FragDepth = log2(flogz) * Fcoef_half;

where Fcoef_half = 0.5 * Fcoef

Note that writing fragment depth disables several depth buffer optimizations that may pose problems in scenes with high overdraw. The non-perspective interpolation isn't usually a problem when the geometry is tesselated finely enough, and in Outerra we are using the fragment depth writing only for objects, since the terrain is tesselated quite well.

Wednesday, March 27, 2013

Craters

Terrain generator in Outerra contains a vector stage that can be used to overlay procedural geometry over the generated terrain. It's used, for example, to create the spline-based roads that seamlessly blend with the underlying terrain, and allows generating fine road geometry where even the road paint can have thickness (a few millimeters).

Dynamic craters are the latest addition into the vector overlay processor.



Craters are dynamically created, specifying their diameter and depth. The algorithm recognizes the type of surface and generates a different shape for asphalt/concrete and dirt. Asphalt is just bent outwards a bit, whereas the dirt is strewn around a lot more.

They get created generally under half a second, which is quick enough with a reserve, given that the creation will be hidden by the explosion's particle effects. The crater shape is also immediately reflected in the collision data.



The shape of the crater also depends on the specified explosion depth, deeper epicenters tend to create steeper edges.



The number of craters is practically unlimited; a single crater definition takes only 64 bits. For now the created craters are kept in a buffer indefinitely, but they are not persisted (yet) between the sessions. Just as the roads, craters only affect the dynamic performance, i.e. when the observer is moving and new terrain tiles have to be generated.

The largest crater that can be currently created is around 1km in diameter. Here are also some older screens showing the evolution of the crater rendering algorithm.



Edit: a video showcasing the craters:


@cameni