Thursday, November 15, 2012

OpenGL Notes #2: Texture bind performance

Some time ago we encountered a strange performance slowdown in our object renderer on NVIDIA hardware. I didn't have mesh sorting by material id implemented yet; I knew that this could be issue but I thought it could take 1ms top. But the actual slowdown was way bigger. Later I found out that it was indeed caused by glBindMultiTextureEXT. So I ran a couple of tests, first using one big texture (2048x2048) for all meshes, then adding the sorting by material id. You can see the times for the final object pass in the Table 1.

TABLE 1: Objects in the test scene had about 400k tris, ~250 meshes and 42 textures
all textures had the same format, diffuse DXT1, normal 3DC/ATI2 etc.
NVIDIA DRV: 306.97306.94 (Nsight) and 310.33, AMD DRV: 12.10
NVIDIA 460 1GB (ms) AMD 6850 (ms)
without sorting 15.0 5.0
one texture 3.3 4.0
sort by mesh/material 6.8 4.3

Those numbers are pretty bad for NVIDIA. But the weird thing is that we have no such issue in the terrain renderer, which uses unique textures for every terrain tile, making about ~200 meshes/textures per frame, but the performance there is great. There is no difference in the render states, object renderer follows immediately after the terrain renderer without any changes to the render states. I have tried a lot of tests - a simple shader, non-DSA functions for texture binding, render without terrain etc., without luck - every time I started to change the textures per mesh I hit the wall. (I used two textures for this test and the time was almost 15 ms).

I moved to NVIDIA Nsight to find out where exactly is this slowdown, and I found out that every time glBindMultiTextureEXT is called, the following draw call (in this case glDrawRangeElements) is much longer both on the CPU side and the GPU side too. With debug rendering context there were no performance warnings. Full of frustration I was browsing the glCapsViewer database and comparing with texture caps between NVIDIA and AMD. There was one interesting number that had drawn my attention: MAX_COMBINED_TEXTURE_IMAGE_UNITS on NVIDIA SM4 hardware is from 96 to 192 texture image units. On AMD only 32 (the spec tells it should be at least 48 for GL3.3, at least 16 per stage and there are three stages in GL3.3). This number means how many textures you can bind at once by glActiveTexture(GL_TEXTURE0+192,...). In one shader stage you can use 16-32 textures only from this budget and these textures are specified by glUniform1i calls.

The idea that came to me was to bind as many textures as I can at once (in my case there were 160 units available on NV460, more than enough for my test scene) and render all meshes without texture binding. Additionally per mesh I call one glUniform1iv, which tells the shader which texture unit is being used. This uniform call is a fast one (usually i'm using glVertexAttrib to pass a mesh specific parameters to the vertex shader, whould be nice to know which way is faster but I think the difference is negligible) The result was more than interesting, see Table 2. 

TABLE 2: The same scene, NVIDIA DRV: 306.97, 306.94 (Nsight) and 310.33, AMD DRV: 12.10
NVIDIA 460 1GB (ms)AMD 6850 (ms)
one texture3.34.0
sort by mesh/material6.84.3
texture bind group3.364.1


I was curious how this will work on AMD, but the limit is 32 image units there. I created so called "texture bind groups" with the max size set to MAX_COMBINED_TEXTURE_IMAGE_UNITS minus the number of channels needed for other stuff (in my case I have 26 free bindable units on AMD) and split the meshes to these groups. The result was better than the version with sorting, so I left this implementation for both vendors.

I would be still interested to learn what is causing such a big slowdown on NVIDIA architecture.

Tuesday, November 13, 2012

Importing models and basic scripting in Outerra

Latest versions of Anteworld have brought support for model import and basic scripting of aircraft and vehicle physics. Outerra supports only the Collada format at the moment, but support for FBX will be coming as well.

For scripting we are using Javascript, reusing the V8 Javascript engine from the embedded Chromium browser. API for the scripting is automatically generated by a tool directly from definitions in our sources, so the same API will be available for C++, Javascript and any other language we add later to the generator.

Here's a video showing the import process and scripting the physics for two vehicles - a 8x8 Tatra truck and a BMW M3 model.


Forum thread for the importer is here.

There's been a couple of models imported already by our fans, for example some architecture test, a Soviet flying boat, Mercedes Benz 500K by Denis Krupin.

See also Jeff's quest in reconstructing Stoumont in Outerra:

 


Thursday, August 2, 2012

Geotagging and Outerra


In the recent unstable build of Anteworld we have implemented a feature that was on our mind for some time already - embedding positional information into the screenshots. Anteworld (and the Outerra engine itself) primarily focuses on planet Earth, rendering real locations using the available elevation data that are then further refined by fractal algorithms. It occurred to us that we could add meta info to screenshots captured from the program, recording the current position and orientation so that these screenshots can be used in other programs that can recognize it to retrieve the position and use it with mapping and other services.

For this purpose we have switched to the JPEG format for captured screens so that we could attach EXIF headers with GPS positional info. The structure of the EXIF format is quite awkward and in my opinion uselessly complicated; instead of a simple extendable text format it uses an indirectly addressed structure with a rigid format, inheriting from TIFF structure. However, the format is rooted deeply and used everywhere.

Geo location can be recorded in special GPS section, where it is possible to record the usual stuff - latitude, longitude, altitude and more. Unfortunately, the standard-defined orientation only counts with the heading angle and doesn't expect someone might need pitch and roll. After a bit of search I found that it was already addressed in exiftool by adding custom fields for GPSPitch and GPSRoll, and so I've decided to use these unofficial extensions so there's at least one quite widely used tool that can recognize it too.

With the positional info embedded within the generated screenshots you can open the image with image viewers such as XnView or IrfanView, which can read it and open a mapping web application, passing in the location data.

For example, the following screenshot can be opened in Xnview:


From there you can go via Edit -> Metadata -> Open GPS Location in GeoHack, which provides quick links to open the location in Google Maps, OpenStreetMap, as a dual view in Topomapper, or in lots of other services.

Google Earth in its recent versions doesn't recognize the location from jpg files (when using Add->Photo), though it supposedly worked in earlier versions.

Jumping into photos


The second, perhaps more interesting part of the geotagging functionality implemented in Outerra is the ability to read the positional info from the screenshots back when starting Outerra, and setting up the camera position and orientation accordingly. This can be used to explore interesting locations that others discover just by using their images, or to use images to store saved locations instead of having to use separate camera position files.

Obviously, this can be used also with images originating from other sources - digital cameras and smartphones with GPS and optional orientation sensor, images geotagged manually etc. It's a quick way to see how a given location looks in Outerra. A couple of examples tested with Outerra:





The direction in the next one (from Iphone) is somewhat off, GPS info says the heading is 255 degrees, but it seems to be more to the north, also by the Google Maps:



As usual, mountainous areas would benefit from using 30m data.

Well. Obviously still a lot of work ahead of us ;-)