Wednesday, November 25, 2009

CPU v GPU Procedural Terrain Texture Generation

Its been an interesting week. Having programmed graphics for some time, having read allot about shaders and having used them briefly I know they are powerful tools for the graphics programmer, but I am still slightly in awe of how quick they are.

It should be noted before reading any further that this is the first shader I have ever written. I've used and modified shaders before such as Sean O'Neil's atmospheric scattering (there is a post below somewhere) and some bumpmapping, but all code in this shader is mine and therefore possibly with some rookie mistakes.

Lets first refresh on the very simple texture generation technique currently implemented. The user specifies a list of terrain regions. Each region has texture data, an optimal, min and max height associated with it. For each pixel in the texture being generated the terrain height at that position is queried, interpolated if required (if the texture is higher resolution than the terrain mesh). This height is then compared to all terrain regions and a colour of the pixel is based on the strength of this height within the regions. There are many examples of the results of this algorithm elsewhere in the blog if you have not already seen.

Above can be seen the times used to generate the textures in software. 2048x2048 taking almost 1 minute! My code in this area isn't by any means heavily optimised, but is well written. Its a relatively simple algorithm of iterating though a list and comparing the height value against the region. Previously when procedurally generating a planet at run-time the texture size of choice was 256x256. This provided average detail but with the generation time of about 1 second, a freeze in movement was very obvious.

Now on to the better news....

What a difference? These times include the full process of using a the shader
  • Binding the Frame buffer so that the texture can be rendered off screen,
  • Enabling the Vertex and Fragment shader, binding the textures required.
  • Rendering the texture
  • Unbinding/disabling everything used during this sequence. 
To get a better approximation on the time used to generate this texture in hardware the times above are also an average of 5000 iterations (that applies only to the GPU times as it would take over 3 days waiting for 5000 2048x2048 CPU textures to be generated).
Here is the fragment shader, which does all the work. The vertex shader just passes the vertex down the render pipeline.

struct vOutput
    float4 color : COLOR;

struct TextureRegion
    float2 startTextureCoord;
    float2 endTextureCoord;

    float optimalHeight;
    float minHeight;
    float maxHeight;

vOutput Main(float2 texCoord : TEXCOORD0,
             uniform sampler2D heightMap : TEX0,
             uniform sampler2D terrainTexture : TEX1,
             uniform int terrainTextureRepeat,
             uniform sampler2D detailTexture : TEX2,
             uniform int detailTextureRepeat,
             uniform float blendingRatio,
             uniform TextureRegion regions[4])
    vOutput OUT;

    //Get the Height
    float4 bytes = tex2D(heightMap, texCoord);
    float height = ((bytes[0] * 16777216.0f) + (bytes[1] * 65536.0f) + (bytes[2] * 256.0f)) / 1000.0f;

    //Strength of this Terrain Tile at this height
    float strength = 0.0f;
    //Color for this Pixel
    OUT.color = float4(0, 0, 0, 1);

    int colorset = 0;

    //For Each Terrain Tile Defined
    for (int loop = 0; loop < 4; loop++)
        //If the Current Terrain Pixel Falls within this range
        if (height > regions[loop].minHeight && regions[loop].maxHeight > height)
            colorset = 1;

            //Work out the % that applies to this height
            //If Height = Optimal, then its 100% otherwise fade out relative to distance between optimal and min/max
            if (height == regions[loop].optimalHeight)
                strength = 1.0f;
            else if (height > regions[loop].optimalHeight)
                float temp1 = regions[loop].maxHeight - regions[loop].optimalHeight;
                strength = ((temp1 - (height - regions[loop].optimalHeight)) / temp1);
            else if (height < regions[loop].optimalHeight)
                float temp1 = height - regions[loop].minHeight;
                float temp2 = regions[loop].optimalHeight - regions[loop].minHeight;
                strength = temp1 / temp2;

            if (strength != 0.0f)
                float2 tileTexCoord;

                //Tile the Texture Coordinates
                tileTexCoord[0] = fmod((texCoord[0] * terrainTextureRepeat), 1.0f);
                tileTexCoord[1] = fmod((texCoord[1] * terrainTextureRepeat), 1.0f);

                //Recalculate the Texture Coordinates so that they are within the Specified Tile
                tileTexCoord = regions[loop].startTextureCoord + ((regions[loop].endTextureCoord - regions[loop].startTextureCoord) * tileTexCoord);

                //Get the Color at this Terrain Coordinate
                OUT.color += (tex2D(terrainTexture, tileTexCoord) * strength);

    if (0.0f == colorset)
        //Make Pink so that its obvious on the terrain (only for debugging)
        OUT.color = float4(1, 0, 1, 1);
        //Scale the Texture Coordinate for Repeating detail and get the Detail Map Color
        texCoord *= detailTextureRepeat;
        float4 detailColor = tex2D(detailTexture, texCoord);

        //Interpolate Between the 2 Colors to get final Color
        OUT.color = lerp(OUT.color, detailColor, blendingRatio);

    return OUT;

This week I have been using this shader in 2 ways.

  1. Use as described above, to generate a texture once per terrain patch (will get generated in higher detail when the patch subdivides) and this texture is then used when rendering.
  2. Use and bind every frame which gives per-pixel texture generation. This has the obvious disadvantage of requiring that the texture data for the terrain is generated each frame, but obviously does so for only the onscreen terrain. It has the nice advantage of not taking up any graphics memory, no need for frame buffers, rendering off screen, etc.... I was getting between 200 and 600 fps using this method.
I dont know how I will ultimately use this shader in the future. I will have to experiment and see which is the preferred method.

All the above results were generated on my laptop which has the following.

Renderer: ATI Mobility Radeon HD 3670
Vendor: ATI Technologies Inc.
Memory: 512 MB
Version: 3.0.8599 Forward-Compatible Context
Shading language version: 1.30

Max texture size: 8192 x 8192
Max texture coordinates: 16
Max vertex texture image units: 16
Max texture image units: 16
Max geometry texture units: 0
Max anisotropic filtering value: 16
Max number of light sources: 8
Max viewport size: 8192 x 8192
Max uniform vertex components: 512
Max uniform fragment components: 512
Max geometry uniform components: 0
Max varying floats: 68
Max samples: 8
Max draw buffers: 8

As always comments are welcome and appreciated.

No comments:

Post a Comment