Wednesday, November 25, 2009

CPU v GPU Procedural Terrain Texture Generation

Its been an interesting week. Having programmed graphics for some time, having read allot about shaders and having used them briefly I know they are powerful tools for the graphics programmer, but I am still slightly in awe of how quick they are.

It should be noted before reading any further that this is the first shader I have ever written. I've used and modified shaders before such as Sean O'Neil's atmospheric scattering (there is a post below somewhere) and some bumpmapping, but all code in this shader is mine and therefore possibly with some rookie mistakes.

Lets first refresh on the very simple texture generation technique currently implemented. The user specifies a list of terrain regions. Each region has texture data, an optimal, min and max height associated with it. For each pixel in the texture being generated the terrain height at that position is queried, interpolated if required (if the texture is higher resolution than the terrain mesh). This height is then compared to all terrain regions and a colour of the pixel is based on the strength of this height within the regions. There are many examples of the results of this algorithm elsewhere in the blog if you have not already seen.



Above can be seen the times used to generate the textures in software. 2048x2048 taking almost 1 minute! My code in this area isn't by any means heavily optimised, but is well written. Its a relatively simple algorithm of iterating though a list and comparing the height value against the region. Previously when procedurally generating a planet at run-time the texture size of choice was 256x256. This provided average detail but with the generation time of about 1 second, a freeze in movement was very obvious.

Now on to the better news....



What a difference? These times include the full process of using a the shader
  • Binding the Frame buffer so that the texture can be rendered off screen,
  • Enabling the Vertex and Fragment shader, binding the textures required.
  • Rendering the texture
  • Unbinding/disabling everything used during this sequence. 
To get a better approximation on the time used to generate this texture in hardware the times above are also an average of 5000 iterations (that applies only to the GPU times as it would take over 3 days waiting for 5000 2048x2048 CPU textures to be generated).
Here is the fragment shader, which does all the work. The vertex shader just passes the vertex down the render pipeline.

struct vOutput
{
    float4 color : COLOR;
};

struct TextureRegion
{
    float2 startTextureCoord;
    float2 endTextureCoord;

    float optimalHeight;
    float minHeight;
    float maxHeight;
};

vOutput Main(float2 texCoord : TEXCOORD0,
             uniform sampler2D heightMap : TEX0,
             uniform sampler2D terrainTexture : TEX1,
             uniform int terrainTextureRepeat,
             uniform sampler2D detailTexture : TEX2,
             uniform int detailTextureRepeat,
             uniform float blendingRatio,
             uniform TextureRegion regions[4])
{
    vOutput OUT;

    //Get the Height
    float4 bytes = tex2D(heightMap, texCoord);
    float height = ((bytes[0] * 16777216.0f) + (bytes[1] * 65536.0f) + (bytes[2] * 256.0f)) / 1000.0f;

    //Strength of this Terrain Tile at this height
    float strength = 0.0f;
   
    //Color for this Pixel
    OUT.color = float4(0, 0, 0, 1);

    int colorset = 0;

    //For Each Terrain Tile Defined
    for (int loop = 0; loop < 4; loop++)
    {
        //If the Current Terrain Pixel Falls within this range
        if (height > regions[loop].minHeight && regions[loop].maxHeight > height)
        {
            colorset = 1;

            //Work out the % that applies to this height
            //If Height = Optimal, then its 100% otherwise fade out relative to distance between optimal and min/max
            if (height == regions[loop].optimalHeight)
            {
                strength = 1.0f;
            }
            else if (height > regions[loop].optimalHeight)
            {
                float temp1 = regions[loop].maxHeight - regions[loop].optimalHeight;
                strength = ((temp1 - (height - regions[loop].optimalHeight)) / temp1);
            }
            else if (height < regions[loop].optimalHeight)
            {
                float temp1 = height - regions[loop].minHeight;
                float temp2 = regions[loop].optimalHeight - regions[loop].minHeight;
                strength = temp1 / temp2;
            }

            if (strength != 0.0f)
            {
                float2 tileTexCoord;

                //Tile the Texture Coordinates
                tileTexCoord[0] = fmod((texCoord[0] * terrainTextureRepeat), 1.0f);
                tileTexCoord[1] = fmod((texCoord[1] * terrainTextureRepeat), 1.0f);

                //Recalculate the Texture Coordinates so that they are within the Specified Tile
                tileTexCoord = regions[loop].startTextureCoord + ((regions[loop].endTextureCoord - regions[loop].startTextureCoord) * tileTexCoord);

                //Get the Color at this Terrain Coordinate
                OUT.color += (tex2D(terrainTexture, tileTexCoord) * strength);
            }
        }
    }

    if (0.0f == colorset)
    {
        //Make Pink so that its obvious on the terrain (only for debugging)
        OUT.color = float4(1, 0, 1, 1);
    }
    else
    {
        //Scale the Texture Coordinate for Repeating detail and get the Detail Map Color
        texCoord *= detailTextureRepeat;
        float4 detailColor = tex2D(detailTexture, texCoord);

        //Interpolate Between the 2 Colors to get final Color
        OUT.color = lerp(OUT.color, detailColor, blendingRatio);
    }

    return OUT;
}

This week I have been using this shader in 2 ways.

  1. Use as described above, to generate a texture once per terrain patch (will get generated in higher detail when the patch subdivides) and this texture is then used when rendering.
  2. Use and bind every frame which gives per-pixel texture generation. This has the obvious disadvantage of requiring that the texture data for the terrain is generated each frame, but obviously does so for only the onscreen terrain. It has the nice advantage of not taking up any graphics memory, no need for frame buffers, rendering off screen, etc.... I was getting between 200 and 600 fps using this method.
I dont know how I will ultimately use this shader in the future. I will have to experiment and see which is the preferred method.

All the above results were generated on my laptop which has the following.

Renderer: ATI Mobility Radeon HD 3670
Vendor: ATI Technologies Inc.
Memory: 512 MB
Version: 3.0.8599 Forward-Compatible Context
Shading language version: 1.30


Max texture size: 8192 x 8192
Max texture coordinates: 16
Max vertex texture image units: 16
Max texture image units: 16
Max geometry texture units: 0
Max anisotropic filtering value: 16
Max number of light sources: 8
Max viewport size: 8192 x 8192
Max uniform vertex components: 512
Max uniform fragment components: 512
Max geometry uniform components: 0
Max varying floats: 68
Max samples: 8
Max draw buffers: 8

As always comments are welcome and appreciated.

Thursday, November 19, 2009

Higher Detail Heightmap Textures

When originally creating height maps, most of the tutorials store this information in grey-scale, therefore at the time this is what was implemented in decade. Limited to 256 different heights (as only 1 byte is used per value) it may be acceptable for demo's with small terrain patches, but is inadequate for anything larger with a more realistic topography.

To overcome this a terrain file format was created for decade. This allowed the saving of height data to a binary or text file using multiple bytes per value. With the introduction of shaders into Decade terrain engine, this too has become inadequate. I need to send the height information to the shader, along with the source tiles so that the procedural terrain texture can be created. The only feasible way to send this height information to the Graphics Card is in a texture, but the grey-scale implementation did not have high enough detail.

Solution? Combine the texture implementation with the multi-byte file format. To do this I split the floating point height position across the 3 color bytes using some simple bit shifting.

Using 3 bytes it is possible to represent 16777216 unique values (256x256x256). In the following example I want to maintain 3 digits after the decimal separator. This allows me to have terrain heights from 0.000 to 16777.216 which should be suitable for most procedural planets. It is of course possible to make the number of decimal digits configurable.

To convert a floating point height into 3 bytes (used in the texture rgb).

//Get the Height at the current Terrain Position
float l_fHeight = get_HeightAtPoint(l_iXPosition, l_iYPosition);

//Convert to an int. Multiply by 1000 to keep 3 decmial places.
int l_iColor = (int)(l_fHeight * 1000);

//Seperate into 3 Bytes
l_ucRed = ((l_iHeight >> 16) & 0xFFFF);
l_ucGreen = ((l_iHeight >> 8) & 0xFF);
l_ucBlue = l_iHeight & 0xFF;

and to convert 3 bytes back into a float, with 3 decimal places is as easy as

l_fHeight = ((l_ucRed << 16) | (l_ucGreen << 8) | l_ucBlue) / 1000.0f;



Above is a sample showing grey-scale height maps and their equivalent 24bit height maps. When I first saw the new maps, I thought there was a bug, and there would be rough edges in the terrain, due to the sudden color changes within the texture, however it loads correctly and terrain detail is maintained.

Tuesday, November 10, 2009

Winter is returning and I'm back again.

Yet again the summer months have been a slack time for Decade. As a solo hobby project I find it very difficult to sit at my PC after a long day in the office when the sun is shining outside. Now that the cold, wet and dark nights are back I feel the urge to return to Decade and complete some of my long standing wishlist features.

Over the past week or two I've been researching shaders. I have decided to add support for CG and CGFX to Decade. My first task with this will be to move the procedural texture generation to the GPU. This should hopefully vastly speed up this area of the engine allowing much smoother planet generation.

Having reviewed the Decade code with a fresh mind, some housekeeping is first required. Cleaning up interfaces and improving segments before building upon. Within the next week I hope to have some comparisons between generating the textures on the CPU and the GPU. Not having much experience using the GPU for this type of processing I am unsure what to expect but from reading other blogs regarding Planet and Terrain generation I am confident that it is the right approach to take.