Friday, July 19, 2013

Hacking AMD OpenGL drivers

Even though the logarithmic depth buffer technique works pretty nicely, it has several problems that make its use problematic in some cases. If you use just the vertex shader modification, you can get depth buffer artifacts on longer triangles that are close to the camera, since the depth values aren't correctly interpolated in perspective. It can be helped by a finer tesselation, or by writing the correct values in the fragment shader (possibly just for the geometry that's not tesselated sufficiently). However, writing the fragment depth in shader disables certain hardware depth buffer optimizations like the early depth test, and adds to the bandwidth. That can pose a problem in scenarios with a higher overdraw.

On Direct3D there's a technique that can provide sufficient depth buffer precision - reverse mapping of far/near planes in a normal floating-point depth buffer. In OpenGL it can't be used directly because of a design flaw that causes a huge loss of precision in depth value computation (not just for the floating point depth buffers, see here). See more detail in maximizing depth buffer range and precision blog post.

There's a way to work around it on Nvidia hardware thanks to the support of an unclamped glDepthRange extension (glDepthRangedNV). However, on AMD it's not supported and there were indications that it may not even be possible. But here's what I found: with a glDepthRange(-1, 1) call that would solve the problem, the arguments are clamped to (0, 1) as per specification. But if we go into the disassembly of the call and make it skip the instruction that would cause it to clamp the lower bound:

... and the reverse FP buffer technique suddenly starts working! With precision good enough to handle the range needed to cover the whole universe. Projection matrix to use with it looks like this:

Mproj = X 0 0 0 0 Y 0 0 0 0 0 near 0 0 1 0

There's no far term; the zero depth value is projected to infinity. The precision is very high - for near=0.01m the precision measured on the GPU is around 0.03mm at 100m, 0.003m at 10km, and 0.3m at 1000km and so on.

Of course, hacking the driver this way for normal use would be highly impractical, it was done just to show that actually nothing prevents AMD from supporting the unclamped depth range and getting a depth buffer technique that works with great precision without sacrificing the depth optimizations.

Hoping they will be listening.

Thursday, July 18, 2013

Logarithmic depth buffer optimizations & fixes

An updated logarithmic depth equation (vertex shader):

    //assuming gl_Position was already computed
    gl_Position.z = log2(max(1e-6, 1.0 + gl_Position.w)) * Fcoef - 1.0;

Where Fcoef is a constant or uniform value computed as Fcoef = 2.0 / log2(farplane + 1.0).

Changes (compared to the initial version):
  • using log2 instead of log: in shaders, log function is implemented using the log2 instruction, so it's better to use log2 directly, avoiding an extra multiply
  • clipping issues: for values smaller than or equal to 0 the log function is undefined. In cases when one vertex of the triangle lies further behind the camera (≤ -1), this causes a rejection of the whole triangle even before the triangle is clipped.
    Clamping the value via max(1e-6, 1.0 + gl_Position.w) solves the problem of disappearing long triangles crossing the camera plane.
  • no need to compute depth in camera space: after multiplying with the modelview projection matrix, gl_Position.w component contains the positive depth into the scene, so the above equation is the only thing that has to be added after your normal modelview projection matrix multiply
  • Previously used "C" constant changing the precision distribution was removed, since the precision is normally much higher than necessary, and C=1 works well

To address the issue of the depth not being interpolated in perspectively-correct way, output the following interpolant from the vertex shader:

    //out float flogz;
    flogz = 1.0 + gl_Position.w;

and then in the fragment shader add:

    gl_FragDepth = log2(flogz) * Fcoef_half;

where Fcoef_half = 0.5 * Fcoef

Note that writing fragment depth disables several depth buffer optimizations that may pose problems in scenes with high overdraw. The non-perspective interpolation isn't usually a problem when the geometry is tesselated finely enough, and in Outerra we are using the fragment depth writing only for objects, since the terrain is tesselated quite well.