Monday, June 18, 2012

A couple of OpenGL notes

A couple of notes mostly for my own reference, and for the unlucky souls who google through here after encountering similar problems.

A floating-point depth buffer


Some time ago I wrote about the possible use of a reversed-range floating point depth buffer as an alternative to the logarithmic depth buffer (see also +Thatcher Ulrich's post about logarithmic buffers). A RFP depth buffer has got some advantages over the logarithmic one we are still using today, namely no need to write depth values in the fragment shader for objects close to the camera in order to suppress depth bugs caused by insufficient geometry tesselation, because the depth values interpolated across polygon diverge from the required logarithmic value significantly in that case.

Writing depth values in a fragment shader causes a slight performance drop because of the additional bandwidth the depth values take, and also because it disables various hardware optimizations related to depth buffers. However, we have found the performance drop to be quite small in our case, so we didn't even test the floating point buffer back then.
A disadvantage of floating-point depth buffer is that it takes 32 bits, and when you also need a stencil buffer it no longer fits into a 32 bit alignment, and thus consumes twice as much memory as a 24b integer buffer + 8b stencil (if there was a 24-bit floating point format, its resolution would be insufficient and inferior in comparison to the logarithmic one, in our case).

Recently we have been doing some tests of our new object pipeline, and decided to test the reversed floating-point buffer to see if the advantages can outdo the disadvantages.

However, a new problem arose that's specific to OpenGL: since in OpenGL the depth values use normalized device range of -1 to 1, techniques that rely on a higher precision of floating point values close to zero cannot be normally used because the far plane is mapped to -1 and not to 0.

Alright, I thought, I will sacrifice half of the range and simply output 1..0 values from the vertex shader, the precision will be still sufficient.
However, there is a problem that OpenGL does implicit conversion from NDC to 0..1 range to compute the actual values to write to the depth buffer. That conversion is defined as  0.5*(f-n)*z + 0.5(f+n) where n,f are values given by glDepthRange. With default values of 0,1 that means that the output value of z from vertex shader gets remapped by 0.5*z + 0.5.
Since the reversed FP depth buffer relies on better precision near 0, the addition of 0.5 to values like 1e-12 pretty much discards most of the significant bits, rendering the technique unusable.

Calling glDepthRange with -1, 1 parameters would solve the problem (making the conversion 1*z + 0), but the spec explicitly says that "The parameters n and f are clamped to the range [0; 1], as are all arguments of type clampd or clampf".
On Nvidia hardware the NV_depth_buffer_float extension comes to the rescue, as it allows to set non-clamped values, but alas it's not supported on AMD hardware, and that's the end of the journey for now.

Update: a more complete and updated info about the use of reverse floating point buffer can be found in post Maximizing Depth Buffer Range and Precision.


Masquerading integer values as floats in outputs of fragment shaders


Occasionally (but in our case quite often) there's a need to combine floating point values with integer values in a single texture. For example, our shader that generates road mask (to be applied over terrain geometry in another pass) needs to output road height as float, together with road type and flags as integers. Since using shader instructions uintBitsToFloat or floatBitsToUint is cheap (they are just type casts), it can be easily done.

However, there's a problem: if you use a floating-point render target for that purpose, the exact bit representation of the masqueraded integer values may not be preserved on AMD hardware, as the blending unit, even though inactive, can meddle with the floating point values, altering them in a way that doesn't do much to the interpretation of floating point values, but changes the bits in some cases.

In this case you need to use an integer render target and cast the floating point values to int (or uint) instead.

Sunday, May 27, 2012

Procedural grass rendering


Grass distance 45m; 400k+ blades of grass

Procedural grass in Outerra is rendered in two stages. The first stage generates just a grass canopy, a height mask that produces the overall shape that the grass forms on the terrain. It generates dry grass-less areas as well as grass of varying height using fractal patterns. Output of this stage is also directly used when rendering the terrain in the distance; what you see out there is not the ground level but a procedurally textured envelope of low vegetation. This also means that objects that are hidden in the 3D grass will be hidden under the distant canopy as well.

The second stage generates grass blades dynamically using the canopy data: terrain elevation, grass height and color. Canopy data have resolution roughly 30cm, and the amount of grass blades varies depending on the level of detail, that in turn depends on the distance from the camera.

Levels of detail highlighted


Original image

The algorithm generates each blade via a geometry shader as a triangle strip with 7 vertices and 5 triangles, making a blade with 3 segments. For short blades the 3 segments would be a waste, so in that case the blade is folded into a V-shape between the first and the second segment. This doubles the apparent density for the shorter grass, which is desirable since it doesn't cover the ground as well as the longer one.

Triangle strips for long and short grass blades


At the most detailed level there are 4 blades generated from a single point in the canopy texture. Each detail level halves the amount of blades, while also doubling the width of the remaining ones.
Grass density geometry reduction with distance, from 0, 8 and 16 meters, respectively
Original patch of grass from which the blades above were extracted.

Since the blades are generated individually, they can be also easily animated. Here's a short video where the texture normally used for ocean waves was used to animate the blades. Obviously it will need different parameters, but conceptually it works quite well.


And finally, a longer video showing the grass rendering in motion.


@cameni

Wednesday, April 4, 2012

Road interpolator

Build 3053 fixes several issues with the road system, making it much more robust. The road interpolator, described in an earlier blog post, works by generating road surfaces from relatively simple vector definitions. The algorithm puts some limits on the allowed road curvature and the width of transitional areas where the road sides are blended with terrain. However, the old implementation of the interpolator had several bugs of its own. On the following screenshots: first the new, enhanced implementation, second the old one in some problematic areas:






The precision problems first appeared on the road markings in tighter turns - the center lines started to deform and vanish. Another very common defect was the occurrence of high sharp spikes at the road sides. It could be partly suppressed by narrowing the transitional widths and loosening the turns, but the issues were still there and some types of roads could not be done at all.





In some cases the road surface folded and deformed, as on the following screen:



What changed in the road system is not the core algorithm itself, but rather the setup that omits the geometry shader (speeding it all up), and uses a finer tesselation so that the inner algorithm is more stable as the result.
Additionally it also dynamically shrinks the transitional width of the inner side of turns, which helps to reduce conflicting overlapping areas where multiple road segments try to adjust the road sides.

The new implementation also changes the transition from road sides to rocks. However, this part will need more tweaking, as it's still possible to create roads that are blocked by large rocky outcrops. It can be helped by moving the road a bit outwards.

new
old