Decade Engine
Thursday, November 22, 2012
Decade is leaving blogger.
After a number of years of random and sporadic blogging Decade is leaving blogger. If you would like to follow progress as decade engine continues its move towards mobile platforms please check out the website of my consulting company at Raiblin
Thursday, September 06, 2012
Record video from your iPhone/iPad without any expensive hardware.
In an earlier post I questioned how I could record from my iPad because my Mac Mini struggles to run the IOS emulator and Camtasia at the same time. Google is full of ideas from building frameworks which holds a camera steady above your iPad and actually recording the screen of the physical device, to hardware boxes which take a HDMI input and convert it so that it can be played on your computer screen. All of these are pretty expensive and take substantial effort. There is a simple answer. Use Reflection App.
Reflection App seems to work by pretending its an Apple TV. In the same way that you mirror your iPad display to render on the Apple TV, your Mac will now also appear in the list and your iPad display can be sent to it. I'm not sure how 'legal' this is under Apples terms and conditions but it works well and suits my needs. The free version allows you to record up to 10 minutes of video which will meet the needs of most people wanting to record demo's, but for $14.99 you can have the full version with unlimited recording.
On a side note, the same company have a product called Air Parrot which allows you stream your Mac desktop to an Apple TV. Mountain Lion has this functionality but if you have an older Mac (mid 2011 build for Mac Mini's) Apples solution won't work. Air Parrot will work and again costs $14.99.
Reflection App seems to work by pretending its an Apple TV. In the same way that you mirror your iPad display to render on the Apple TV, your Mac will now also appear in the list and your iPad display can be sent to it. I'm not sure how 'legal' this is under Apples terms and conditions but it works well and suits my needs. The free version allows you to record up to 10 minutes of video which will meet the needs of most people wanting to record demo's, but for $14.99 you can have the full version with unlimited recording.
On a side note, the same company have a product called Air Parrot which allows you stream your Mac desktop to an Apple TV. Mountain Lion has this functionality but if you have an older Mac (mid 2011 build for Mac Mini's) Apples solution won't work. Air Parrot will work and again costs $14.99.
Why pay for an online SVN Repository?
This is my first non programming/decade related post in a long time, but before I start I would like to state that I am making no comment about the merits of SVN over Git and other source control systems. I simply prefer SVN and its been my source control of choice for many years. I will also try and make this post not sound like a rant.
Until recently the decade source has existed in a free repository on unfuddle. Their free plan offers 512MB, 1 project, 2 collaborators etc.... Now that decade spans a desktop version (for Windows, Mac and Linux), a mobile version for IOS and Android and is spawning some mobile games the limitations of space and allowed projects is stifling development. My instinct was to pay for one of the services which these online repositories offer. The average cost on an online repository which has 3gb space and allows 10 projects and up to 5 collaborators is about $15 a month. Not a huge amount of money but it got me thinking why it costs so much.
Dropbox offers users 3gb of backed up online storage for free. Granted, the SVN providers need to have SVN running on their servers, but this is free software, right? Overhead of admin for SVN? Perhaps a little cost. The fact that providing SVN services is a niche market compared to dropbox, which anyone can use for any media, can also add a little cost but $180 a year versus $0?
Why limit the number of projects that I can have? If I'm paying $15 a month for 3gb of space, shouldn't I be allowed to have as many projects as I want so long as I stay under my storage limit? The only answer I can find is that its business. They charge simply because they can.
I use Cornerstone on my Mac as an SVN client. (At $65 this is an expensive piece of software compared to the many free SVN clients out there, but since the cost/benefit to me outweighs the price its worth it. I hope this fact will go some way to dispel any opinion that I'm simply too mean to pay the $15 a month. I simply don't think the service provided warrants that cost compared to other generic online backup services). In Cornerstone, with a few clicks of the mouse, I created a SVN repository on my Dropbox drive. Since the data is on dropbox it is immediately backed up to the cloud. Since Dropbox don't care what I put in my account I can have as many projects as I wish. If I used a dropbox account which was specifically for the project and not for personal use, I could supply the details to others and have as many collaborators as needed.
The only issue I can see with this is that there is no level of indirection between me and the data. Deleting data from an online repository through a web portal would require some very deliberate steps and is therefore unlikely to occur by accident. Deleting files from what appears to be a disk on your local machine is very easy and could happen in error, but with this in mind I think any issues can be easily prevented.
As many projects as your allocated storage will allow, no limit on the number of collaborators and for much cheaper than $180 a year, potentially $0? Simply use Dropbox.
Thoughts?
Until recently the decade source has existed in a free repository on unfuddle. Their free plan offers 512MB, 1 project, 2 collaborators etc.... Now that decade spans a desktop version (for Windows, Mac and Linux), a mobile version for IOS and Android and is spawning some mobile games the limitations of space and allowed projects is stifling development. My instinct was to pay for one of the services which these online repositories offer. The average cost on an online repository which has 3gb space and allows 10 projects and up to 5 collaborators is about $15 a month. Not a huge amount of money but it got me thinking why it costs so much.
Dropbox offers users 3gb of backed up online storage for free. Granted, the SVN providers need to have SVN running on their servers, but this is free software, right? Overhead of admin for SVN? Perhaps a little cost. The fact that providing SVN services is a niche market compared to dropbox, which anyone can use for any media, can also add a little cost but $180 a year versus $0?
Why limit the number of projects that I can have? If I'm paying $15 a month for 3gb of space, shouldn't I be allowed to have as many projects as I want so long as I stay under my storage limit? The only answer I can find is that its business. They charge simply because they can.
I use Cornerstone on my Mac as an SVN client. (At $65 this is an expensive piece of software compared to the many free SVN clients out there, but since the cost/benefit to me outweighs the price its worth it. I hope this fact will go some way to dispel any opinion that I'm simply too mean to pay the $15 a month. I simply don't think the service provided warrants that cost compared to other generic online backup services). In Cornerstone, with a few clicks of the mouse, I created a SVN repository on my Dropbox drive. Since the data is on dropbox it is immediately backed up to the cloud. Since Dropbox don't care what I put in my account I can have as many projects as I wish. If I used a dropbox account which was specifically for the project and not for personal use, I could supply the details to others and have as many collaborators as needed.
The only issue I can see with this is that there is no level of indirection between me and the data. Deleting data from an online repository through a web portal would require some very deliberate steps and is therefore unlikely to occur by accident. Deleting files from what appears to be a disk on your local machine is very easy and could happen in error, but with this in mind I think any issues can be easily prevented.
As many projects as your allocated storage will allow, no limit on the number of collaborators and for much cheaper than $180 a year, potentially $0? Simply use Dropbox.
Thoughts?
Tuesday, June 19, 2012
Let there be light
I remember many many years ago, when first learning graphics concepts, I was working though 'Programming Role Playing Games with DirectX'. There was a model included on the CD of a castle with a mote and I thought it looked terrible. The textures were gaudy, the resolution was low and it was quiet jarring and unpleasant to look at. In a later chapter, an introduction to lights and shading, the same model was reused. The difference in results was unbelievable. With the correct shading depth perception was easier and the scene, despite its initial poor quality, took on a much more natural and realistic feel.
Here is a comparison of the same mesh, with the same texture, rendered from the same point of view with and without shading.
attribute vec4 position;
attribute vec4 normal;
attribute vec2 uv0;
varying vec2 _uv0;
varying float _diffuse;
uniform mat4 modelViewProjectionMatrix;
uniform vec4 normalizedSunPosition;
void main()
{
//Texure coordinates are needed in fragment shader
_uv0 = uv0;
//Calculate diffuse value
vec4 nor = normalize(modelViewProjectionMatrix * normal);
_diffuse = max(dot(nor, normalizedSunPosition), 0.0);
//Translate vertex
gl_Position = modelViewProjectionMatrix * position;
}
Fragment Shader
varying lowp vec2 _uv0;
varying lowp float _diffuse;
uniform sampler2D dayTexture;
uniform sampler2D nightTexture;
void main()
{
gl_FragColor = (texture2D(nightTexture, _uv0) * (1.0 - _diffuse)) + (texture2D(dayTexture, _uv0) * _diffuse);
}
Per Pixel Shading
attribute vec4 position;
attribute vec4 normal;
attribute vec2 uv0;
varying vec2 _uv0;
varying vec4 _normal;
uniform mat4 modelViewProjectionMatrix;
void main()
{
_uv0 = uv0;
_normal = normalize(modelViewProjectionMatrix * normal);
gl_Position = modelViewProjectionMatrix * position;
}
Fragment Shader
varying lowp vec2 _uv0;
varying lowp vec4 _normal;
uniform sampler2D dayTexture;
uniform sampler2D nightTexture;
uniform lowp vec4 normalizedSunPosition;
void main()
{
lowp float _diffuse = max(dot(_normal, normalizedSunPosition), 0.0);
gl_FragColor = (texture2D(nightTexture, _uv0) * (1.0 - _diffuse)) + (texture2D(dayTexture, _uv0) * _diffuse);
}
Here is a comparison of the same mesh, with the same texture, rendered from the same point of view with and without shading.
The picture above, and the video below, both use a simple diffuse lighting calculation to shade the side of the earth which is facing away from the sun. This can be implemented as per vertex lighting, or per pixel lighting. Each method has pros and cons which I shall briefly discuss here. If I get anything wrong or leave something out please comment.
(Instead of using the diffuse value to shade the back side of the earth, I use it as a ratio in a multisample between a day and night time texture. Since the night time texture is already colored to show shade and darkness the result is the same but I also get lights from civilization on the dark side)
Per Vertex Lighting
- Diffuse value is calculated in the vertex shader.
- Faster since diffuse value is only calculated once per polygon and each pixel in the polygon has the same diffuse value (or is that 3 times per polygon, once per vertex, and the diffuse value is interpolated across the face of the poly?)
- Since the diffuse value is per vertex and not per pixel the value is not always correct and some shade popping occurs. Check out the video to see this.
attribute vec4 position;
attribute vec4 normal;
attribute vec2 uv0;
varying vec2 _uv0;
varying float _diffuse;
uniform mat4 modelViewProjectionMatrix;
uniform vec4 normalizedSunPosition;
void main()
{
//Texure coordinates are needed in fragment shader
_uv0 = uv0;
//Calculate diffuse value
vec4 nor = normalize(modelViewProjectionMatrix * normal);
_diffuse = max(dot(nor, normalizedSunPosition), 0.0);
//Translate vertex
gl_Position = modelViewProjectionMatrix * position;
}
Fragment Shader
varying lowp vec2 _uv0;
varying lowp float _diffuse;
uniform sampler2D dayTexture;
uniform sampler2D nightTexture;
void main()
{
gl_FragColor = (texture2D(nightTexture, _uv0) * (1.0 - _diffuse)) + (texture2D(dayTexture, _uv0) * _diffuse);
}
Per Pixel Shading
- Diffuse value is calculated in the fragment shader.
- Potentially slower as there are generally allot more pixels rendered than vertices and therefore allot more diffuse calculations.
- More realistic, smooth results.
attribute vec4 position;
attribute vec4 normal;
attribute vec2 uv0;
varying vec2 _uv0;
varying vec4 _normal;
uniform mat4 modelViewProjectionMatrix;
void main()
{
_uv0 = uv0;
_normal = normalize(modelViewProjectionMatrix * normal);
gl_Position = modelViewProjectionMatrix * position;
}
Fragment Shader
varying lowp vec2 _uv0;
varying lowp vec4 _normal;
uniform sampler2D dayTexture;
uniform sampler2D nightTexture;
uniform lowp vec4 normalizedSunPosition;
void main()
{
lowp float _diffuse = max(dot(_normal, normalizedSunPosition), 0.0);
gl_FragColor = (texture2D(nightTexture, _uv0) * (1.0 - _diffuse)) + (texture2D(dayTexture, _uv0) * _diffuse);
}
Sunday, June 17, 2012
Lets bring some atmosphere to the party.
Another minor update. Instead of investing more significant amounts of time and adding atmospheric scattering I decided to simply add a cloud layer to the planet. The cloud texture came as part of the texture set I am using to render the earth.
Initially I bound the cloud textures (the image data and the transparency map) to texture space 1 and 2 (the earth texture is in space 0) and I attempted to render the clouds directly onto the planet. Since the application that this software will be used in will only view the planet from high orbit, flattening the clouds directly onto the earth texture wouldn't be an issue. The results were unsatisfying. If I looked online I could probably find some correct GLSL for doing alpha blending in a shader, and not by setting the OpenGL state machine with glEnable(GL_BLEND); glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); however I have never tried this before so took quicker approach.
I've created a 2nd sphere for the atmosphere. It is a fraction larger than the earth sphere and the cloud texture is blended onto this. This approach is more costly to render as I am rendering 2 spheres instead of one, and the alpha blending needs to sample the color buffer, however the spheres are relatively low resolution (1681 vertices, 9600 indices which are rendered as a triangle strip to make 3200 polygons per sphere. The sphere is dynamically built at run-time allowing this resolution to be changed).
This method also allows the clouds to easily move independently of the planet, as real clouds do. I don't want to suggest that this wouldn't be possible if I flattened the clouds onto the earth sphere, it probably would by doing some texture coordinate scrolling, but it would result in a more complex shader. Slower to run? Perhaps but defiantly more difficult to understand.
No code worth while to show in this example. The shaders used for the atmosphere layer are pretty much identical to those shown previously.
Next task is to add time of day by creating a light source (the sun) and using texture blending to render the dark side of the earth. I am confident that a standard lighting algorithm will work for this, but instead of using the diffuse lighting value to darken a color (shade that pixel) the diffuse value will be used as a ratio to sample the day and night earth textures.
*Edit*
I've made a short video showing the mobile version of the engine running on a mobile platform, or at least the simulator for that platform.
Initially I bound the cloud textures (the image data and the transparency map) to texture space 1 and 2 (the earth texture is in space 0) and I attempted to render the clouds directly onto the planet. Since the application that this software will be used in will only view the planet from high orbit, flattening the clouds directly onto the earth texture wouldn't be an issue. The results were unsatisfying. If I looked online I could probably find some correct GLSL for doing alpha blending in a shader, and not by setting the OpenGL state machine with glEnable(GL_BLEND); glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); however I have never tried this before so took quicker approach.
I've created a 2nd sphere for the atmosphere. It is a fraction larger than the earth sphere and the cloud texture is blended onto this. This approach is more costly to render as I am rendering 2 spheres instead of one, and the alpha blending needs to sample the color buffer, however the spheres are relatively low resolution (1681 vertices, 9600 indices which are rendered as a triangle strip to make 3200 polygons per sphere. The sphere is dynamically built at run-time allowing this resolution to be changed).
This method also allows the clouds to easily move independently of the planet, as real clouds do. I don't want to suggest that this wouldn't be possible if I flattened the clouds onto the earth sphere, it probably would by doing some texture coordinate scrolling, but it would result in a more complex shader. Slower to run? Perhaps but defiantly more difficult to understand.
No code worth while to show in this example. The shaders used for the atmosphere layer are pretty much identical to those shown previously.
Next task is to add time of day by creating a light source (the sun) and using texture blending to render the dark side of the earth. I am confident that a standard lighting algorithm will work for this, but instead of using the diffuse lighting value to darken a color (shade that pixel) the diffuse value will be used as a ratio to sample the day and night earth textures.
*Edit*
I've made a short video showing the mobile version of the engine running on a mobile platform, or at least the simulator for that platform.
Monday, June 11, 2012
A nice segue from Decade Engine to Mobile Development.
A friend is in the process of writing a nice iPad application. I shall not go into any detail regarding the app as it is his idea and not mine to share. The app needs to render the earth, allow the user to rotate the planet, zoom in and out to country level and also allow the user to touch anywhere on the globe and if a country has been pressed, that country is highlighted and this information available to the app layer.
To date he has been using an open framework called whirlyglobe. This is a pretty impressive framework and I would recommend that you check it out but after testing it on an iPad2 and 'The New iPad' within the app it seemed a little slow. Vector files are being used to highlight the country, raising it above others when selected. All of this is in very high detail and looks excellent, but this detail does come at a cost. The response on the iPad is sluggish and would probably be even more so when there is an app sitting above it.
When looking into how we could improve the performance, I suggested that I could use the concepts that I developed when programming the original Decade Engine, along with the new features I have been learning with converting the original engine to OpenGL 3/OpenGL ES 2.0.
Here is the first rendering from Decade Mobile. Please note that this video was recorded off my Mac Mini but the same code (with minor changes which I shall document in a latter post) has been built and runs on an iPad and iPhone.
The textures used in this video have been purchased from here. Since zooming is only required to the country level and not to the cm or meter level as was possible in the original Decade Engine, I thought it overkill to use the procedural sphere technique so instead just use a normal sphere. Some webgl code for generating the vertices for a sphere can be found here.
_______________________________________________________________________________
Sphere Generation (Vertex and Index Buffer) Code
void Sphere::Create(const Vector3 center, GLfloat radius, GLuint precision)
{
vector vertices;
GLuint latitudeBands = precision;
GLuint longitudeBands = precision;
for (GLuint latNumber = 0; latNumber <= latitudeBands; latNumber++)
{
GLfloat theta = latNumber * M_PI / latitudeBands;
GLfloat sinTheta = sinf(theta);
GLfloat cosTheta = cosf(theta);
for (GLuint longNumber = 0; longNumber <= longitudeBands; longNumber++)
{
GLfloat phi = longNumber * 2 * M_PI / longitudeBands;
GLfloat sinPhi = sinf(phi);
GLfloat cosPhi = cosf(phi);
GLfloat x = cosPhi * sinTheta;
GLfloat y = cosTheta;
GLfloat z = sinPhi * sinTheta;
GLfloat u = 1.0f - ((GLfloat)longNumber / (GLfloat)longitudeBands);
GLfloat v = (GLfloat)latNumber / (GLfloat)latitudeBands;
VERTEX_POSITION_UV0 vertex;
vertex.Position = Point4(radius * x, radius * y, radius * z, 1.0f);
vertex.U0 = u;
vertex.V0 = 1.0f - v;
vertices.push_back(vertex);
}
}
vector indices;
for (GLuint latNumber = 0; latNumber < latitudeBands; latNumber++)
{
for (GLuint longNumber = 0; longNumber < longitudeBands; longNumber++)
{
GLuint first = (latNumber * (longitudeBands + 1)) + longNumber;
GLuint second = first + longitudeBands + 1;
indices.push_back(first);
indices.push_back(second);
indices.push_back(first + 1);
indices.push_back(second);
indices.push_back(second + 1);
indices.push_back(first + 1);
}
}
vertexBuffer.Create((float*)vertices, VERTEX_POSITION_UV0::GetFloatsInFormat(), vertices,size(), VERTEX_POSITION_UV0::GetFormat());
indexBuffer.Create(indices, indices.size());
}
void Sphere::Bind()
{
vertexBuffer.Bind();
indexBuffer.Bind();
}
void Sphere::Render()
{
vertexBuffer.Render(&indexBuffer, GL_TRIANGLES);
}
_________________________________________________________________________________
Vertex Shader
uniform mat4 mvp;
in vec4 position;
in vec2 uv0;
out vec2 textureCoord0;
void main (void)
{
textureCoord0 = uv0;
gl_Position = mvp * position;
}
_________________________________________________________________________________
Fragment Shader
in vec2 textureCoord0;
uniform sampler2D texture0;
out vec4 fragColor;
void main(void)
{
fragColor = texture(texture0, textureCoord0);
}
To date he has been using an open framework called whirlyglobe. This is a pretty impressive framework and I would recommend that you check it out but after testing it on an iPad2 and 'The New iPad' within the app it seemed a little slow. Vector files are being used to highlight the country, raising it above others when selected. All of this is in very high detail and looks excellent, but this detail does come at a cost. The response on the iPad is sluggish and would probably be even more so when there is an app sitting above it.
When looking into how we could improve the performance, I suggested that I could use the concepts that I developed when programming the original Decade Engine, along with the new features I have been learning with converting the original engine to OpenGL 3/OpenGL ES 2.0.
Here is the first rendering from Decade Mobile. Please note that this video was recorded off my Mac Mini but the same code (with minor changes which I shall document in a latter post) has been built and runs on an iPad and iPhone.
The textures used in this video have been purchased from here. Since zooming is only required to the country level and not to the cm or meter level as was possible in the original Decade Engine, I thought it overkill to use the procedural sphere technique so instead just use a normal sphere. Some webgl code for generating the vertices for a sphere can be found here.
_______________________________________________________________________________
Sphere Generation (Vertex and Index Buffer) Code
void Sphere::Create(const Vector3 center, GLfloat radius, GLuint precision)
{
vector
GLuint latitudeBands = precision;
GLuint longitudeBands = precision;
for (GLuint latNumber = 0; latNumber <= latitudeBands; latNumber++)
{
GLfloat theta = latNumber * M_PI / latitudeBands;
GLfloat sinTheta = sinf(theta);
GLfloat cosTheta = cosf(theta);
for (GLuint longNumber = 0; longNumber <= longitudeBands; longNumber++)
{
GLfloat phi = longNumber * 2 * M_PI / longitudeBands;
GLfloat sinPhi = sinf(phi);
GLfloat cosPhi = cosf(phi);
GLfloat x = cosPhi * sinTheta;
GLfloat y = cosTheta;
GLfloat z = sinPhi * sinTheta;
GLfloat u = 1.0f - ((GLfloat)longNumber / (GLfloat)longitudeBands);
GLfloat v = (GLfloat)latNumber / (GLfloat)latitudeBands;
VERTEX_POSITION_UV0 vertex;
vertex.Position = Point4(radius * x, radius * y, radius * z, 1.0f);
vertex.U0 = u;
vertex.V0 = 1.0f - v;
vertices.push_back(vertex);
}
}
vector
for (GLuint latNumber = 0; latNumber < latitudeBands; latNumber++)
{
for (GLuint longNumber = 0; longNumber < longitudeBands; longNumber++)
{
GLuint first = (latNumber * (longitudeBands + 1)) + longNumber;
GLuint second = first + longitudeBands + 1;
indices.push_back(first);
indices.push_back(second);
indices.push_back(first + 1);
indices.push_back(second);
indices.push_back(second + 1);
indices.push_back(first + 1);
}
}
vertexBuffer.Create((float*)vertices, VERTEX_POSITION_UV0::GetFloatsInFormat(), vertices,size(), VERTEX_POSITION_UV0::GetFormat());
indexBuffer.Create(indices, indices.size());
}
void Sphere::Bind()
{
vertexBuffer.Bind();
indexBuffer.Bind();
}
void Sphere::Render()
{
vertexBuffer.Render(&indexBuffer, GL_TRIANGLES);
}
_________________________________________________________________________________
Vertex Shader
uniform mat4 mvp;
in vec4 position;
in vec2 uv0;
out vec2 textureCoord0;
void main (void)
{
textureCoord0 = uv0;
gl_Position = mvp * position;
}
_________________________________________________________________________________
Fragment Shader
in vec2 textureCoord0;
uniform sampler2D texture0;
out vec4 fragColor;
void main(void)
{
fragColor = texture(texture0, textureCoord0);
}
Wednesday, May 16, 2012
Creating an OpenGL 3 application on Mac
Except for a brief adventure into the world of Ubuntu, most development of Decade was completed in Windows therefore to transition to Mac and IOS new projects have to be created.
Using XCode to create an IOS (iPhone or iPad) application, the IDE pretty much does all the setup for you when you select an OpenGL project.
Modifying
allows you choose between OpenGL ES 1 and OpenGL ES 2.
Creating an OS X application does not present the same OpenGL option. This is your responsibility but I found to my frustration that allot of the tutorials online are insufficient. The standard approach seems to be create your own view which inherits from NSOpenGLView
This initially worked great. Using the fixed function pipeline I could position, rotate and render a cube. Since OpenGL ES 2 does not support the fixed function pipeline I needed to modify this code to remove all fixed functions and use GLSL, the programmable pipeline, instead.
The issues were obvious immediately. The shaders would not compile.
always returned 0, and
the function used to get the errors when compiling shaders always returned an empty string. It took quiet a long time browsing forums to find out why this was happening. It turns out that Apple have decided that when NSOpenGLView is used as above, it will always uses OpenGL version 1, which doesn't support shaders. Interface builder does not allow you change the version of OpenGL used. This makes me question why the code even compiled and ran if it was using a version of OpenGL which did not support shaders.
The Solution:
To create a view which inherited from NSObject and create your own NSOpenGLView in code.
My interface now looks like
and OpenGL can be initialised in
NSOpenGLPFAColorSize , 24 ,
NSOpenGLPFAAlphaSize , 8 ,
NSOpenGLPFADepthSize , 32 ,
NSOpenGLPFADoubleBuffer ,
NSOpenGLPFAAccelerated ,
NSOpenGLPFANoRecovery ,
0
};
NSOpenGLPixelFormat *pixelFormat = [[[NSOpenGLPixelFormat alloc] initWithAttributes:pixelFormatAttributes] autorelease];
[self setView:[[[NSOpenGLView alloc] initWithFrame:[[[self window] contentView] bounds] pixelFormat:pixelFormat] autorelease]];
[[[self window] contentView] addSubview:[self view]];
}
The shaders now compiled successfully and I could again see the rotating cubes on screen. :)
Using XCode to create an IOS (iPhone or iPad) application, the IDE pretty much does all the setup for you when you select an OpenGL project.
Modifying
self.context = [[[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES2] autorelease];
allows you choose between OpenGL ES 1 and OpenGL ES 2.
Creating an OS X application does not present the same OpenGL option. This is your responsibility but I found to my frustration that allot of the tutorials online are insufficient. The standard approach seems to be create your own view which inherits from NSOpenGLView
and Cocos handles allot of the setup for you. Rather than repeat what is already documented on may sites you can find an informative and easy to follow tutorial here.This initially worked great. Using the fixed function pipeline I could position, rotate and render a cube. Since OpenGL ES 2 does not support the fixed function pipeline I needed to modify this code to remove all fixed functions and use GLSL, the programmable pipeline, instead.
The issues were obvious immediately. The shaders would not compile.
glGetShaderiv(shader, GL_COMPILE_STATUS, &status);
always returned 0, and
glGetShaderInfoLog(shader, logLength, &logLength, log);
the function used to get the errors when compiling shaders always returned an empty string. It took quiet a long time browsing forums to find out why this was happening. It turns out that Apple have decided that when NSOpenGLView is used as above, it will always uses OpenGL version 1, which doesn't support shaders. Interface builder does not allow you change the version of OpenGL used. This makes me question why the code even compiled and ran if it was using a version of OpenGL which did not support shaders.
The Solution:
To create a view which inherited from NSObject and create your own NSOpenGLView in code.
My interface now looks like
@interface DecadeEngine : NSObject
@property (nonatomic, readwrite, retain) IBOutlet NSWindow *window;
@property (nonatomic, readwrite, retain) IBOutlet NSOpenGLView *view;
@end
@property (nonatomic, readwrite, retain) IBOutlet NSWindow *window;
@property (nonatomic, readwrite, retain) IBOutlet NSOpenGLView *view;
@end
and OpenGL can be initialised in
- (void)awakeFromNib
{
NSOpenGLPixelFormatAttribute pixelFormatAttributes[] =
{
NSOpenGLPFAOpenGLProfile, NSOpenGLProfileVersion3_2Core,{
NSOpenGLPFAColorSize , 24 ,
NSOpenGLPFAAlphaSize , 8 ,
NSOpenGLPFADepthSize , 32 ,
NSOpenGLPFADoubleBuffer ,
NSOpenGLPFAAccelerated ,
NSOpenGLPFANoRecovery ,
0
};
NSOpenGLPixelFormat *pixelFormat = [[[NSOpenGLPixelFormat alloc] initWithAttributes:pixelFormatAttributes] autorelease];
[self setView:[[[NSOpenGLView alloc] initWithFrame:[[[self window] contentView] bounds] pixelFormat:pixelFormat] autorelease]];
[[[self window] contentView] addSubview:[self view]];
}
The shaders now compiled successfully and I could again see the rotating cubes on screen. :)
Thursday, May 10, 2012
Decade goes Mobile
The development of Decade in its original form (Desktop application, mainly developed in Visual Studio on Windows) is pretty much dead. Perhaps, like in the recent popularity of zombies, it will come back to life some day, but for now the source is sitting in an online repository gathering dust.
I have started to play around with graphics on mobile devices and OpenGL ES 2.0. I remember many years ago, when I first started Decade Engine, how much motivation I received from writing blog posts especially when comments were left. Hope to receive the same motivation this time I am going to repeat the blogging process and can hopefully write some interesting and informative posts.
Staying true to my ever existing believe that C++ is the best game development language means that I can reuse allot of Decade Engine code which will hopefully speed things up a little. Does this limit which mobile devices I can develop for? Not really.
Now to the technology.... My first foray into mobile development was to implement a plasma shader. I thought that a fragment shader implementation of, good old, Perlin Noise would be perfect for a plasma effect. Not yet familiar with loading resources on a mobile device, I decided to use a self contained noise implementation which did not require a permutations texture. An excellent and interactive example of such a shader can be found here,
The screenshot is taken from the demo running on an iPad3. The frame rate is terrible on the emulator therefore taking a video isn't an option. I have tried to take video from my iPad, but given that I use my phone camera it is very jerky and poor quality. Is there any good way video capture an iPad screen?
There is nothing really special about this. It runs very poorly on an iPhone 3GS and runs OK, but not great, on an iPad2 and an iPad3. I am also unsure why there are obvious gradient levels instead of smooth blending between colors. The color of a given pixel is calculated as
float n = noise(vec3(position, time));
vec3 col2 = vec3(1.0 - n, 1.0 - n, 0.0);
gl_FragColor = vec4((col1 + col2) / 2.0, 1.0);
Any obvious bug here?
The next stage will be to add the permutations texture to the shader. I believe that this method is a little more work upfront, but should be faster as the permutation data (or at least an approximation of the permutation data) does not need to be calculated each time the shader is run. (iPad3 retina display is 2048x1536 pixels. The fragment shader is run for each pixel which is rendered. My plasma is full screen therefore the fragment shader is run 3145728 times per frame. That is a lot of potential calculations which can be avoided).
That's pretty much all for now. Its good to be back, and I hope to post more than once or twice a year.
Ciaran
I have started to play around with graphics on mobile devices and OpenGL ES 2.0. I remember many years ago, when I first started Decade Engine, how much motivation I received from writing blog posts especially when comments were left. Hope to receive the same motivation this time I am going to repeat the blogging process and can hopefully write some interesting and informative posts.
Staying true to my ever existing believe that C++ is the best game development language means that I can reuse allot of Decade Engine code which will hopefully speed things up a little. Does this limit which mobile devices I can develop for? Not really.
- On IOS (iPhone and iPad) one can easily mix C or C++ into Objective C. To make an Objective C file compile as C++ instead of C, you simply rename its extension from .m to .mm.
- On Android I am going to use the exact same C++ code that I write for the IOS game. There is an Eclipse plugin called "Sequoyah" which allows you to compile and debug native code however the Android emulator does not support OpenGL ES 2.0 therefore pretty much anything I write will not work. Because of this, the IOS version will take priority until I purchase an Android device.
Now to the technology.... My first foray into mobile development was to implement a plasma shader. I thought that a fragment shader implementation of, good old, Perlin Noise would be perfect for a plasma effect. Not yet familiar with loading resources on a mobile device, I decided to use a self contained noise implementation which did not require a permutations texture. An excellent and interactive example of such a shader can be found here,
The screenshot is taken from the demo running on an iPad3. The frame rate is terrible on the emulator therefore taking a video isn't an option. I have tried to take video from my iPad, but given that I use my phone camera it is very jerky and poor quality. Is there any good way video capture an iPad screen?
There is nothing really special about this. It runs very poorly on an iPhone 3GS and runs OK, but not great, on an iPad2 and an iPad3. I am also unsure why there are obvious gradient levels instead of smooth blending between colors. The color of a given pixel is calculated as
float n = noise(vec3(position, time));
//Red and Yellow
vec3 col1 = vec3(n, 0.0, 0.0);vec3 col2 = vec3(1.0 - n, 1.0 - n, 0.0);
gl_FragColor = vec4((col1 + col2) / 2.0, 1.0);
Any obvious bug here?
The next stage will be to add the permutations texture to the shader. I believe that this method is a little more work upfront, but should be faster as the permutation data (or at least an approximation of the permutation data) does not need to be calculated each time the shader is run. (iPad3 retina display is 2048x1536 pixels. The fragment shader is run for each pixel which is rendered. My plasma is full screen therefore the fragment shader is run 3145728 times per frame. That is a lot of potential calculations which can be avoided).
That's pretty much all for now. Its good to be back, and I hope to post more than once or twice a year.
Ciaran
Tuesday, November 01, 2011
How to generate a procedural sphere.
I have received an email asking to explain how I generate a procedural sphere as per this post. Rather than respond to emails personally, I have decide in future to answer the questions on the blog, presenting the information to a wider audience for comment, correction, improvement and perhaps a little learning.
The base of my generated sphere is a cube. Depending on how smooth the generated sphere needs to be, I recursively split each side into 4 equally sized children. This is achieved my finding the center of the face, popping that point out so that it is the correct radius from the center, then making 4 faces using the original points and this new point. (There is some ASCII art showing this in the code below)
void CSphere::Initialize(float p_fRadius, int p_iMaxDepth)
{
// TLB----TRB
// /| /|
// / | / |
// TLF----TRF |
// | BLB--|BRB
// | / | /
// | / |/
// BLF----BRF
//Putting the Vertices of the initial Cube at p_fRadius is not correct as the distance of
//p_fRadius, p_fRadius, p_fRadius from the Origin is greater than p_fRadius.
CVector3 l_Vertices[8];
l_Vertices[TOP_LEFT_FRONT] = MoveToRadiusDistance(CVector3(-p_fRadius, p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[TOP_RIGHT_FRONT] = MoveToRadiusDistance(CVector3( p_fRadius, p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[BOTTOM_RIGHT_FRONT] = MoveToRadiusDistance(CVector3( p_fRadius, -p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[BOTTOM_LEFT_FRONT] = MoveToRadiusDistance(CVector3(-p_fRadius, -p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[TOP_LEFT_BACK] = MoveToRadiusDistance(CVector3(-p_fRadius, p_fRadius, p_fRadius), p_fRadius);
l_Vertices[TOP_RIGHT_BACK] = MoveToRadiusDistance(CVector3( p_fRadius, p_fRadius, p_fRadius), p_fRadius);
l_Vertices[BOTTOM_RIGHT_BACK] = MoveToRadiusDistance(CVector3( p_fRadius, -p_fRadius, p_fRadius), p_fRadius);
l_Vertices[BOTTOM_LEFT_BACK] = MoveToRadiusDistance(CVector3(-p_fRadius, -p_fRadius, p_fRadius), p_fRadius);
//Initialize the faces of the cube (The face structure just stores the vertices for the face corners and has render functionality, not applicable to this explanation, and its depth in the face tree)
m_pFaces[FRONT].Initialise(FRONT, l_Vertices[TOP_LEFT_FRONT], l_Vertices[TOP_RIGHT_FRONT], l_Vertices[BOTTOM_RIGHT_FRONT], l_Vertices[BOTTOM_LEFT_FRONT], DEPTH0);
m_pFaces[RIGHT].Initialise(RIGHT, l_Vertices[TOP_RIGHT_FRONT], l_Vertices[TOP_RIGHT_BACK], l_Vertices[BOTTOM_RIGHT_BACK], l_Vertices[BOTTOM_RIGHT_FRONT], DEPTH0);
m_pFaces[BACK].Initialise(BACK, l_Vertices[TOP_RIGHT_BACK], l_Vertices[TOP_LEFT_BACK], l_Vertices[BOTTOM_LEFT_BACK], l_Vertices[BOTTOM_RIGHT_BACK], DEPTH0);
m_pFaces[LEFT].Initialise(LEFT, l_Vertices[TOP_LEFT_BACK], l_Vertices[TOP_LEFT_FRONT], l_Vertices[BOTTOM_LEFT_FRONT], l_Vertices[BOTTOM_LEFT_BACK], DEPTH0);
m_pFaces[TOP].Initialise(TOP, l_Vertices[TOP_LEFT_BACK], l_Vertices[TOP_RIGHT_BACK], l_Vertices[TOP_RIGHT_FRONT], l_Vertices[TOP_LEFT_FRONT], DEPTH0);
m_pFaces[BOTTOM].Initialise(BOTTOM, l_Vertices[BOTTOM_LEFT_FRONT], l_Vertices[BOTTOM_RIGHT_FRONT], l_Vertices[BOTTOM_RIGHT_BACK], l_Vertices[BOTTOM_LEFT_BACK], DEPTH0);
//Subdivide each patch to the lowest resolution
m_pPatches[FRONT].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[RIGHT].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[BACK].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[LEFT].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[TOP].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[BOTTOM].SubDivide(p_fRadius, p_iMaxDepth);
}
where
bool CSphereFace::SubDivide(float p_fRadius, int p_iMaxDepth)
{
if (m_iDepth >= p_iMaxDepth)
return false;
//Create the Additional Vertices
//
// NW---------------D-------------NE
// | | |
// | | |
// | | |
// | | |
// A--------------Center-----------C
// | | |
// | | |
// | | |
// | | |
// SW----------------B-------------SE
//
//
g_vAdditionalVertices[A] = m_vBaseVertices[eNorthWest] + ((m_vBaseVertices[eSouthWest] - m_vBaseVertices[eNorthWest]) / 2.0f);
g_vAdditionalVertices[B] = m_vBaseVertices[eSouthWest] + ((m_vBaseVertices[eSouthEast] - m_vBaseVertices[eSouthWest]) / 2.0f);
g_vAdditionalVertices[C] = m_vBaseVertices[eNorthEast] + ((m_vBaseVertices[eSouthEast] - m_vBaseVertices[eNorthEast]) / 2.0f);
g_vAdditionalVertices[D] = m_vBaseVertices[eNorthWest] + ((m_vBaseVertices[eNorthEast] - m_vBaseVertices[eNorthWest]) / 2.0f);
//Create Child Nodes
m_pChildren = new CSphereFace[4];
m_pChildren[eNorthWest].Initialise(eNorthWest, m_vBaseVertices[eNorthWest], g_vAdditionalVertices[D], m_vBaseVertices[eCentre], g_vAdditionalVertices[A], m_iDepth + 1);
m_pChildren[eNorthEast].Initialise(eNorthEast, g_vAdditionalVertices[D], m_vBaseVertices[eNorthEast], g_vAdditionalVertices[C], m_vBaseVertices[eCentre], m_iDepth + 1);
m_pChildren[eSouthWest].Initialise(eSouthWest, g_vAdditionalVertices[A], m_vBaseVertices[eCentre], g_vAdditionalVertices[B], m_vBaseVertices[eSouthWest], m_iDepth + 1);
m_pChildren[eSouthEast].Initialise(eSouthEast, m_vBaseVertices[eCentre], g_vAdditionalVertices[C], m_vBaseVertices[eSouthEast], g_vAdditionalVertices[B], m_iDepth + 1);
m_pChildren[eNorthWest].SubDivide(p_fRadius, p_iMaxDepth);
m_pChildren[eNorthEast].SubDivide(p_fRadius, p_iMaxDepth);
m_pChildren[eSouthWest].SubDivide(p_fRadius, p_iMaxDepth);
m_pChildren[eSouthEast].SubDivide(p_fRadius, p_iMaxDepth);
return true;
}
and
CVector3 MoveToRadiusDistance(CVector3 p_Vector, float p_fRadius)
{
//Get the Normalized Vector, of this vertex from the origin (center of the sphere) and pop it out to the correct radius,
return p_Vector.Normalize() * p_fRadius;
}
That is pretty much the basics of how I create the sphere from recursively subdividing a cube. If I overlooked anything please comment and I will correct the post to reflect any gap in information or any mistake.
The base of my generated sphere is a cube. Depending on how smooth the generated sphere needs to be, I recursively split each side into 4 equally sized children. This is achieved my finding the center of the face, popping that point out so that it is the correct radius from the center, then making 4 faces using the original points and this new point. (There is some ASCII art showing this in the code below)
void CSphere::Initialize(float p_fRadius, int p_iMaxDepth)
{
// TLB----TRB
// /| /|
// / | / |
// TLF----TRF |
// | BLB--|BRB
// | / | /
// | / |/
// BLF----BRF
//Putting the Vertices of the initial Cube at p_fRadius is not correct as the distance of
//p_fRadius, p_fRadius, p_fRadius from the Origin is greater than p_fRadius.
CVector3 l_Vertices[8];
l_Vertices[TOP_LEFT_FRONT] = MoveToRadiusDistance(CVector3(-p_fRadius, p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[TOP_RIGHT_FRONT] = MoveToRadiusDistance(CVector3( p_fRadius, p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[BOTTOM_RIGHT_FRONT] = MoveToRadiusDistance(CVector3( p_fRadius, -p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[BOTTOM_LEFT_FRONT] = MoveToRadiusDistance(CVector3(-p_fRadius, -p_fRadius, -p_fRadius), p_fRadius);
l_Vertices[TOP_LEFT_BACK] = MoveToRadiusDistance(CVector3(-p_fRadius, p_fRadius, p_fRadius), p_fRadius);
l_Vertices[TOP_RIGHT_BACK] = MoveToRadiusDistance(CVector3( p_fRadius, p_fRadius, p_fRadius), p_fRadius);
l_Vertices[BOTTOM_RIGHT_BACK] = MoveToRadiusDistance(CVector3( p_fRadius, -p_fRadius, p_fRadius), p_fRadius);
l_Vertices[BOTTOM_LEFT_BACK] = MoveToRadiusDistance(CVector3(-p_fRadius, -p_fRadius, p_fRadius), p_fRadius);
//Initialize the faces of the cube (The face structure just stores the vertices for the face corners and has render functionality, not applicable to this explanation, and its depth in the face tree)
m_pFaces[FRONT].Initialise(FRONT, l_Vertices[TOP_LEFT_FRONT], l_Vertices[TOP_RIGHT_FRONT], l_Vertices[BOTTOM_RIGHT_FRONT], l_Vertices[BOTTOM_LEFT_FRONT], DEPTH0);
m_pFaces[RIGHT].Initialise(RIGHT, l_Vertices[TOP_RIGHT_FRONT], l_Vertices[TOP_RIGHT_BACK], l_Vertices[BOTTOM_RIGHT_BACK], l_Vertices[BOTTOM_RIGHT_FRONT], DEPTH0);
m_pFaces[BACK].Initialise(BACK, l_Vertices[TOP_RIGHT_BACK], l_Vertices[TOP_LEFT_BACK], l_Vertices[BOTTOM_LEFT_BACK], l_Vertices[BOTTOM_RIGHT_BACK], DEPTH0);
m_pFaces[LEFT].Initialise(LEFT, l_Vertices[TOP_LEFT_BACK], l_Vertices[TOP_LEFT_FRONT], l_Vertices[BOTTOM_LEFT_FRONT], l_Vertices[BOTTOM_LEFT_BACK], DEPTH0);
m_pFaces[TOP].Initialise(TOP, l_Vertices[TOP_LEFT_BACK], l_Vertices[TOP_RIGHT_BACK], l_Vertices[TOP_RIGHT_FRONT], l_Vertices[TOP_LEFT_FRONT], DEPTH0);
m_pFaces[BOTTOM].Initialise(BOTTOM, l_Vertices[BOTTOM_LEFT_FRONT], l_Vertices[BOTTOM_RIGHT_FRONT], l_Vertices[BOTTOM_RIGHT_BACK], l_Vertices[BOTTOM_LEFT_BACK], DEPTH0);
//Subdivide each patch to the lowest resolution
m_pPatches[FRONT].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[RIGHT].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[BACK].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[LEFT].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[TOP].SubDivide(p_fRadius, p_iMaxDepth);
m_pPatches[BOTTOM].SubDivide(p_fRadius, p_iMaxDepth);
}
where
bool CSphereFace::SubDivide(float p_fRadius, int p_iMaxDepth)
{
if (m_iDepth >= p_iMaxDepth)
return false;
//Create the Additional Vertices
//
// NW---------------D-------------NE
// | | |
// | | |
// | | |
// | | |
// A--------------Center-----------C
// | | |
// | | |
// | | |
// | | |
// SW----------------B-------------SE
//
//
g_vAdditionalVertices[A] = m_vBaseVertices[eNorthWest] + ((m_vBaseVertices[eSouthWest] - m_vBaseVertices[eNorthWest]) / 2.0f);
g_vAdditionalVertices[B] = m_vBaseVertices[eSouthWest] + ((m_vBaseVertices[eSouthEast] - m_vBaseVertices[eSouthWest]) / 2.0f);
g_vAdditionalVertices[C] = m_vBaseVertices[eNorthEast] + ((m_vBaseVertices[eSouthEast] - m_vBaseVertices[eNorthEast]) / 2.0f);
g_vAdditionalVertices[D] = m_vBaseVertices[eNorthWest] + ((m_vBaseVertices[eNorthEast] - m_vBaseVertices[eNorthWest]) / 2.0f);
//Create Child Nodes
m_pChildren = new CSphereFace[4];
m_pChildren[eNorthWest].Initialise(eNorthWest, m_vBaseVertices[eNorthWest], g_vAdditionalVertices[D], m_vBaseVertices[eCentre], g_vAdditionalVertices[A], m_iDepth + 1);
m_pChildren[eNorthEast].Initialise(eNorthEast, g_vAdditionalVertices[D], m_vBaseVertices[eNorthEast], g_vAdditionalVertices[C], m_vBaseVertices[eCentre], m_iDepth + 1);
m_pChildren[eSouthWest].Initialise(eSouthWest, g_vAdditionalVertices[A], m_vBaseVertices[eCentre], g_vAdditionalVertices[B], m_vBaseVertices[eSouthWest], m_iDepth + 1);
m_pChildren[eSouthEast].Initialise(eSouthEast, m_vBaseVertices[eCentre], g_vAdditionalVertices[C], m_vBaseVertices[eSouthEast], g_vAdditionalVertices[B], m_iDepth + 1);
m_pChildren[eNorthWest].SubDivide(p_fRadius, p_iMaxDepth);
m_pChildren[eNorthEast].SubDivide(p_fRadius, p_iMaxDepth);
m_pChildren[eSouthWest].SubDivide(p_fRadius, p_iMaxDepth);
m_pChildren[eSouthEast].SubDivide(p_fRadius, p_iMaxDepth);
return true;
}
and
CVector3 MoveToRadiusDistance(CVector3 p_Vector, float p_fRadius)
{
//Get the Normalized Vector, of this vertex from the origin (center of the sphere) and pop it out to the correct radius,
return p_Vector.Normalize() * p_fRadius;
}
That is pretty much the basics of how I create the sphere from recursively subdividing a cube. If I overlooked anything please comment and I will correct the post to reflect any gap in information or any mistake.
Saturday, May 07, 2011
Can I claim progress even if I am behind where I used be?
I am the first to admit annoyance at having to redo functionality which I have previously implemented, however with the experience and lessons learned when coding procedural planets the first time, my implementation this time is smaller (in code) and more efficient than before.
I shall try and highlight an area which is
#define SQUARE(x) ((x)*(x))
float Length(CVector3* p_pvOne, CVector3* p_pvTwo)
{
return (float)(sqrt(SQUARE(pvTwo->X - p_pvOne->X) + SQUARE(pvTwo->Y - p_pvOne->Y) + SQUARE(pvTwo->Z - p_pvOne->Z)));
}
(The sqrt function has been traditionally considered slow and avoided if possible. I am unsure of its performance on modern hardware however the above could be optimised by not calculating the sqrt in the length function, and comparing it to the expected value squared.)
An better optimisation is to remove the requirement for calculating the distance from each patch to the centre of LOD completely. Instead of using the distance from the centre, I now recursively step from the centre to each neighbour, then onto each of their neighbours while incrementing the 'distance' with each step. When the 'distance' reaches a predefined value, the edge of the LOD area has been reached. In the following images 'distance limit' is set to 3.
The results above are not as desired. Some analysis showed that the east and west neighbours of the centre of LOD are within 3 recursive steps from the north neighbour (which is processed first). Because of this the east and west patches are marked as processed and will not be processed again when directly updated from the centre patch.
To overcome this, when processing a patch, if it is already flagged as processed, I compare its distance in steps from the centre of LOD. If the current distance is less than the stored distance I process again. Reading that sounds a little confusing so I shall try and explain in some steps. (only key steps are listed)
A simple rule of "if one of my siblings is being rendered at a specific LOD, I must also render at that LOD even if I am not within range of the centre of LOD" fixes this problem.
The above example has resulted in code which is smaller, simpler to understand and faster to run than the previous version based on actual distance.
I shall try and highlight an area which is
- Easier to implement
- Less code
- More efficient
#define SQUARE(x) ((x)*(x))
float Length(CVector3* p_pvOne, CVector3* p_pvTwo)
{
return (float)(sqrt(SQUARE(pvTwo->X - p_pvOne->X) + SQUARE(pvTwo->Y - p_pvOne->Y) + SQUARE(pvTwo->Z - p_pvOne->Z)));
}
(The sqrt function has been traditionally considered slow and avoided if possible. I am unsure of its performance on modern hardware however the above could be optimised by not calculating the sqrt in the length function, and comparing it to the expected value squared.)
An better optimisation is to remove the requirement for calculating the distance from each patch to the centre of LOD completely. Instead of using the distance from the centre, I now recursively step from the centre to each neighbour, then onto each of their neighbours while incrementing the 'distance' with each step. When the 'distance' reaches a predefined value, the edge of the LOD area has been reached. In the following images 'distance limit' is set to 3.
The results above are not as desired. Some analysis showed that the east and west neighbours of the centre of LOD are within 3 recursive steps from the north neighbour (which is processed first). Because of this the east and west patches are marked as processed and will not be processed again when directly updated from the centre patch.
To overcome this, when processing a patch, if it is already flagged as processed, I compare its distance in steps from the centre of LOD. If the current distance is less than the stored distance I process again. Reading that sounds a little confusing so I shall try and explain in some steps. (only key steps are listed)
- Move from the centre patch to the north. Its distance is stored as 1
- Move from the current patch (north) to the east. Distance is stored as 2
- Move from the current patch (north->east) to the south (this is the centres east neighbour). Distance is stored as 3.
- Move from the centre patch to the east. This patch is flagged as processed at a distance of 3. The current distance is 1 therefore ignore previous processing and process again.
That looks better. It is a uniform area around the centre of LOD. However, it is still not correct. To render the next area of lower LOD I step 1 level up the patch tree and render outwards from here. Patches are not rendered if any of its children have been rendered as this would cause an unacceptable overlap and possible onscreen artifacts. This results in huge holes in the terrain.
The above example has resulted in code which is smaller, simpler to understand and faster to run than the previous version based on actual distance.
Planet rendered from orbit showing LOD |
Same planet rendered from atmosphere, again showing LOD |
Labels:
C++,
decade,
game,
opengl,
procedural planet
Monday, March 28, 2011
Remove back facing patches before the render pipeline.
In a previous post, Procedural Planet - Subdividing Cube I mentioned how I remove complete patches which are facing away from the camera before the API culled their polygons in the render pipeline.
"Using frustum culling is not enough to remove any unrendered polygons from the planet. When close to a planet it can look like a flat terrain, just like the earth does for us as we stand on it, but from height it can be seen that the planet is in-fact spherical. With this knowledge it is possible to mathematically remove allot of the planet patches which are on the opposite side of the planet. With Back face culling the API would remove these anyway, however it would be very wasteful to pass these invisible faces down the render pipeline. By using a DotProduct with the LookAt vector of the camera and the Normal of the planet patch translated to model space, it is very simple to ignore these patches."
This code was part of what was lost from Decade with the recient SVN blooper, and therefore had to be rewritten. Despite having implemented the functionality about 18 months ago it took me some time to grasp the idea. I feel that a more technical post would be useful and will hopefully help anyone else when implementing similar functionality.
Anyone with experience in graphics programming will be familar with the concept of backface culling. As each polygon is passed down the rendering pipeline, it is tested to see if it is facing torwards or away from the camera. Since polygons facing away from the camera cannot be seen there is no need to render them. This concept can be applied when rendering a planet, however instead of just testing on a polygon by polygon basis, I test patch by patch. This allows me to ignore large collections of polygons with 1 test instead of the previously described polygon test.
How is this achieved?
Two pieces of information are required in order to test if a patch is facing towards or away from the camera.
1) The camera look-at vector. This is maintained by the camera object and updated as the camera moves and rotates around the world.
2) The normal vector of the patch. I calculate this when the patch is created by doing a CrossProduct of the vectors of 2 sides of the patch.
"Using frustum culling is not enough to remove any unrendered polygons from the planet. When close to a planet it can look like a flat terrain, just like the earth does for us as we stand on it, but from height it can be seen that the planet is in-fact spherical. With this knowledge it is possible to mathematically remove allot of the planet patches which are on the opposite side of the planet. With Back face culling the API would remove these anyway, however it would be very wasteful to pass these invisible faces down the render pipeline. By using a DotProduct with the LookAt vector of the camera and the Normal of the planet patch translated to model space, it is very simple to ignore these patches."
This code was part of what was lost from Decade with the recient SVN blooper, and therefore had to be rewritten. Despite having implemented the functionality about 18 months ago it took me some time to grasp the idea. I feel that a more technical post would be useful and will hopefully help anyone else when implementing similar functionality.
Anyone with experience in graphics programming will be familar with the concept of backface culling. As each polygon is passed down the rendering pipeline, it is tested to see if it is facing torwards or away from the camera. Since polygons facing away from the camera cannot be seen there is no need to render them. This concept can be applied when rendering a planet, however instead of just testing on a polygon by polygon basis, I test patch by patch. This allows me to ignore large collections of polygons with 1 test instead of the previously described polygon test.
How is this achieved?
Two pieces of information are required in order to test if a patch is facing towards or away from the camera.
1) The camera look-at vector. This is maintained by the camera object and updated as the camera moves and rotates around the world.
2) The normal vector of the patch. I calculate this when the patch is created by doing a CrossProduct of the vectors of 2 sides of the patch.
If the values of A,B,C and D in the above image are
l_vA: X=-11.547007 Y=11.547007 Z=-11.547007
l_vB: X=-6.6666670 Y=13.333334 Z=-13.333334
l_vC: X=-8.1649666 Y=8.1649666 Z=-16.329933
l_vD: X=-13.333334 Y=6.6666670 Z=-13.333334
The normal of the patch can be calculated using the following code.
m_vNormal = CalculateNormal(l_vB, l_vA, l_vC);
where
CVector3 CalculateNormal(CVector3 p_vOne, CVector3 p_vTwo, CVector3 p_vThree)
{
CVector3 l_vA = p_vTwo - p_vOne;
CVector3 l_vB = p_vThree - p_vOne;
return Normalize(CrossProduct(l_vA, l_vB));
}
CVector3 CrossProduct(CVector3 p_vVectorOne, CVector3 p_vVectorTwo)
{
CVector3 l_vResult;
l_vResult.X = p_vVectorOne.Y * p_vVectorTwo.Z - p_vVectorOne.Z * p_vVectorTwo.Y;
l_vResult.Y = p_vVectorOne.Z * p_vVectorTwo.X - p_vVectorOne.X * p_vVectorTwo.Z;
l_vResult.Z = p_vVectorOne.X * p_vVectorTwo.Y - p_vVectorOne.Y * p_vVectorTwo.X;
return l_vResult;
}
The calculated normal would be X=0.44721335 Y=-0.44721335 Z=0.77459687
With this information, as each patch is rendered, a simple DotProduct of these 2 pieces of information returns a floating point value. If this value is less than 0, the patch is facing away from the camera and therefore it and all its child patches can immediately be discarded.
float l_fDotProduct = DotProduct(m_vNormal, p_pCamera->get_LookAt());
if (0.0f > l_fDotProduct)
return;
where
float DotProduct(CVector3 p_vVectorOne, CVector3 p_vVectorTwo)
{
return p_vVectorTwo.X * p_vVectorOne.X + p_vVectorTwo.Y * p_vVectorOne.Y + p_vVectorTwo.Z * p_vVectorOne.Z;
}
One more issue must be dealt with before we have a complete solution. The above works well for static objects where all vertices are relative to the origin. However what will happen if the object is rotating?
The answer of that question is shown in the following image.
It may not be obvious from an image instead of a realtime demo so I will try and explain. The above implemented patch culling is processed on
the raw sphere data. This is equilavent to removing back-facing patches (anything from the back of the sphere, relative to the camera, is removed), then rotating the sphere on the Y axis (in the image above
the sphere is rotated by 130 degrees) and then rendering with the API culling all back-facing polygons. This order is obviously incorrect.
The more correct sequence would be to rotate the sphere, remove all back-facing patches, then render remaining patches and allow the API to remove any back-facing polygons in the front-facing patches. Since rotation occurs in the render pipeline it isn't possible for us to rotate before we remove the back-facing patches.
The solution is to multiply the camera look-at vector by the modelview matrix. This is equilavent to transforming the camera by the same values that will be applied to the sphere, resulting in the correct back-facing patches being removed, regardless of what rotation/translation/scaling is applied to the sphere.
float l_fDotProduct = DotProduct(m_vNormal, p_pCamera->get_LookAt() * p_pGraphics->get_Matrix(eModelView));
if (0.0f > l_fDotProduct)
return;
(Note: Since p_pCamera->get_LookAt() * p_pGraphics->get_Matrix(eModelView) will yield the same value for every patch, it would be better to calculate this once per frame for each planet that is being rendered. This value can then be used within the test on each patch in the planet.)
Labels:
backface culling,
matrix,
modelview,
procedural planet
Tuesday, March 22, 2011
Two steps forward, three steps back
In order to not admit what was probably a result of my own stupidity, I'm going to blame SVN. Regardless of what happened, it has now become apparent to me that the version of Decade Engine Source that I have isn't the latest version. It is missing my implementation of GPU planet generation. I've checked my online repository, backup disks etc .....
This means that I have to recode those sections. A chore, but on a positive side, I know the pit-falls and issues I encountered the last time, and can hopefully design around these and end up with better solution.
As per my previous post, I have also started IPhone and Android development. I shall be working on multiple projects at the same time, and rather than mix it all up on this blog, I have created a sister blog to this for Decade Mobile. Any updates which are specific to the mobile platforms shall be posted there.
This means that I have to recode those sections. A chore, but on a positive side, I know the pit-falls and issues I encountered the last time, and can hopefully design around these and end up with better solution.
As per my previous post, I have also started IPhone and Android development. I shall be working on multiple projects at the same time, and rather than mix it all up on this blog, I have created a sister blog to this for Decade Mobile. Any updates which are specific to the mobile platforms shall be posted there.
Tuesday, March 08, 2011
Back online in Sydney
Hello again. Its been far to long! I am now settled and living in Sydney and have decided that the time to resume Decade is far overdue.
Development of Decade shall continue as before with procedural planery bodies, but over the past few months I have started to program IPhone/IPad and Android, therefore I think it would be fun to create a mobile Decade Engine and try to make some simple but fun phone games.
Let the adventure begin (yet again!)
Ciarán
Development of Decade shall continue as before with procedural planery bodies, but over the past few months I have started to program IPhone/IPad and Android, therefore I think it would be fun to create a mobile Decade Engine and try to make some simple but fun phone games.
Let the adventure begin (yet again!)
Ciarán
Saturday, June 26, 2010
Moving to Australia
Decade Engine will be on hold for a short while as I immigrate to Australia. Thank you to everyone who has emailed questions and support regarding my blog and development. I shall be back online and back in development 'DownUnder'.
Ciarán
Ciarán
Tuesday, March 16, 2010
GPU Procedural Planet
This example does not use any textures to colour the terrain, and is therefore 100% procedurally generated. In basic terms it means that no media is loaded. Everything is 100% generated in the engine.
Let’s take some time to recap what needs to be procedurally generated in order to render the planet (shown in the video below).
Permutations Texture
The topography of the planet is created using noise algorithms. This example uses Multi Ridged Brownian Fractal Motion. At runtime this noise is created using a series of permutations which are stored in a texture so that it can be accessed by in a shader. The texture data doesn’t make allot of visual sense, however here is an example of what it looks like.
1 Vertex buffer
The planet is rendered as a series of patches. It is this patch nature which allows the recursive sub-division resulting in the increased/decreased level of visible detail. Whereas the CPU Planet generates a unique vertex buffer for each patch (because the noise is calculated when the patch is created and applied to height data in the vertex buffer), the GPU Planet only uses 1 vertex buffer of X * X vertices, generated procedurally, which are displaced in a shader at runtime for each patch to be rendered.
16 Index Buffers
An index buffer is used along with a vertex buffer to render some geometry. In allot of cases 1 vertex buffer is used with 1 index buffer. As described in previous posts a terrain requires 16 index buffers, generated procedurally, so that there are no terrain cracks. It must be possible for the edges of terrain patches, with different levels of detail, to join together seamlessly.
In the video above shows a basic GPU Planet. There is quiet an obvious bug visible as the camera moves. Because all noise is generated on the GPU, the Decade Engine running on the CPU has no knowledge of how much a vertex is displaced. All distance checking from the camera to the planet is calculated from the camera position to the underlying sphere of the planet (vertex displaced to radius of the planet but not with height noise applied). This is ok when the camera is high above the terrain however as the camera moves close to the surface, especially if this ground is at a high altitude, the sphere position may still be some distance beneath and therefore terrain subdivision do not occur properly.
I am considering 2 possible techniques to over come this
Let’s take some time to recap what needs to be procedurally generated in order to render the planet (shown in the video below).
Permutations Texture
The topography of the planet is created using noise algorithms. This example uses Multi Ridged Brownian Fractal Motion. At runtime this noise is created using a series of permutations which are stored in a texture so that it can be accessed by in a shader. The texture data doesn’t make allot of visual sense, however here is an example of what it looks like.
1 Vertex buffer
The planet is rendered as a series of patches. It is this patch nature which allows the recursive sub-division resulting in the increased/decreased level of visible detail. Whereas the CPU Planet generates a unique vertex buffer for each patch (because the noise is calculated when the patch is created and applied to height data in the vertex buffer), the GPU Planet only uses 1 vertex buffer of X * X vertices, generated procedurally, which are displaced in a shader at runtime for each patch to be rendered.
16 Index Buffers
An index buffer is used along with a vertex buffer to render some geometry. In allot of cases 1 vertex buffer is used with 1 index buffer. As described in previous posts a terrain requires 16 index buffers, generated procedurally, so that there are no terrain cracks. It must be possible for the edges of terrain patches, with different levels of detail, to join together seamlessly.
In the video above shows a basic GPU Planet. There is quiet an obvious bug visible as the camera moves. Because all noise is generated on the GPU, the Decade Engine running on the CPU has no knowledge of how much a vertex is displaced. All distance checking from the camera to the planet is calculated from the camera position to the underlying sphere of the planet (vertex displaced to radius of the planet but not with height noise applied). This is ok when the camera is high above the terrain however as the camera moves close to the surface, especially if this ground is at a high altitude, the sphere position may still be some distance beneath and therefore terrain subdivision do not occur properly.
I am considering 2 possible techniques to over come this
- Generate the same noise values on the CPU as is generated on the GPU. Since all (pseudo) random data is stored in the permutations texture, it should be possible.
- Render the height data to a texture instead of generating it as required each frame, then use this texture for shader vertex displacement as well as calculating the height of key vertices on the CPU.
Wednesday, February 24, 2010
Concentric LOD Areas
Not really any new functionality, however small but important modifications. In the previous posts and video of the planet each patch was independent and updated when it thought correct regardless of the state of its neighbours. This resulted in a non-uniform patch pattern and multiple terrain cracks. A more detailed explanation of terrain cracks can be seen here.
Instead of a patchwork quilt on the planet surface, the terrain LOD (Level of Detail) decreases in concentric circles with its origin at the camera. All patches which neighbour a patch with lower LOD render the corner edge downgraded to the lower LOD preventing any terrain cracks appearing.
In the above pictures the radius of the LOD circle is set to 6.0. This means that the LOD circle has a radius of the length of the patch at this level of detail multiplied by 6.0. This value can be changed at runtime if desired resulting in a higher or lower detail of terrain.
Another change in the example, although not obvious from the pictures is that the planet is no longer updated/rendered from the root node. Now, during the first frame the lead patch is found. This is the patch that is directly below the camera at the correct LOD. Each frame when the planet updates the lead patch moves to one of its neighbours, children or its parent if required. This requires a little more code than simply recursively moving across the patch tree each frame, but should be faster as it removes the processing on many invisible patches (those which are closer to the root of the patch tree but are too low level of detail to meet our needs).
Instead of a patchwork quilt on the planet surface, the terrain LOD (Level of Detail) decreases in concentric circles with its origin at the camera. All patches which neighbour a patch with lower LOD render the corner edge downgraded to the lower LOD preventing any terrain cracks appearing.
In the above pictures the radius of the LOD circle is set to 6.0. This means that the LOD circle has a radius of the length of the patch at this level of detail multiplied by 6.0. This value can be changed at runtime if desired resulting in a higher or lower detail of terrain.
Another change in the example, although not obvious from the pictures is that the planet is no longer updated/rendered from the root node. Now, during the first frame the lead patch is found. This is the patch that is directly below the camera at the correct LOD. Each frame when the planet updates the lead patch moves to one of its neighbours, children or its parent if required. This requires a little more code than simply recursively moving across the patch tree each frame, but should be faster as it removes the processing on many invisible patches (those which are closer to the root of the patch tree but are too low level of detail to meet our needs).
Friday, January 15, 2010
Procedural Planet - Subdividing Cube
At long last I have returned to what was the growing focus of Decade Engine in the latter stages of 2008. A previous post showed the most basic functionality of Decade's planet generation. I hope to give more details here.
The below video has 3 sections.
The below video has 3 sections.
- Run-time cube subdivision and application of noise to create planet structure. Patch size is 10x10 quads. To give decent minimum resolution, the lowest possible level of subdividing in this example is 3, with the highest being 16. As the camera moves towards a patch, and gets within range (configured to radius of patch * 5 in this example) the patch subdivides into 4 children which are rendered instead of the parent when the camera is within range.
- Similar to section 1, except when the camera moves away from the patch, the level of detail which is rendered does not reduce. This would allow users to see the size of patches at every allowed level on screen at once, however when far away the patches at level 15 and/or 16 are smaller than 1 pixel so not really visible. Some very basic math will tell us that if the planet in view was earth size (i.e. radius of 6378km) the length of a patch at level 1 would be 7364.67km. At level 16 the patch length is only 0.1123km and with 10 quads per patch length, the max resolution of the planet is just above 1m. By increasing the max allowed depth rendered, or the resolution of each patch, this maximum planet resoultion could be increased to cm level.
- Using frustum culling is not enough to remove any unrendered polygons from the planet. When close to a planet it can look like a flat terrain, just like the earth does for us as we stand on it, but from height it can be seen that the planet is in-fact spherical. With this knowledge it is possible to mathematically remove allot of the planet patches which are on the opposite side of the planet. With Back face culling the API would remove these anyway, however it would be very wasteful to pass these invisible faces down the render pipeline. By using a DotProduct with the LookAt vector of the camera and the Normal of the planet patch translated to model space, it is very simple to ignore these patches.
Wednesday, December 09, 2009
Terrain Editor - Version 2
Last week I received a request from a reader of DecadeBlog for access to the terrain editor. What little pride I have kicked in and in an effort to supply something a little more usable than Version 1 I set aside a little bit of time to add some features.
Version 2 is representitive of a more traditional editor, showing multiple views of the subject. 3 independent views of the terrain can be seen in the video below. Each view maintains its own render state. It is, for example, possible to show one view in wireframe, while the others remain solid or are shown in point form. There is also correct "Screen to World Projection" for each view.
Version 2 also contains some erosion filters. I noticed when editing that the old issue of terrain steps was occuring. By giving the ability to erode or filter the terrain any rough edges are smoothed out.
All application level in the above example is again scripted. On startup the DecadeEngine Sandbox calls the script init function and each frame the HandleInput, Update, Render3d and Render2d functions are called in the script. If anyone is interested in having the script please mail me or comment to this post. Its a little long and boring to publish here.
Wednesday, November 25, 2009
CPU v GPU Procedural Terrain Texture Generation
Its been an interesting week. Having programmed graphics for some time, having read allot about shaders and having used them briefly I know they are powerful tools for the graphics programmer, but I am still slightly in awe of how quick they are.
It should be noted before reading any further that this is the first shader I have ever written. I've used and modified shaders before such as Sean O'Neil's atmospheric scattering (there is a post below somewhere) and some bumpmapping, but all code in this shader is mine and therefore possibly with some rookie mistakes.
Lets first refresh on the very simple texture generation technique currently implemented. The user specifies a list of terrain regions. Each region has texture data, an optimal, min and max height associated with it. For each pixel in the texture being generated the terrain height at that position is queried, interpolated if required (if the texture is higher resolution than the terrain mesh). This height is then compared to all terrain regions and a colour of the pixel is based on the strength of this height within the regions. There are many examples of the results of this algorithm elsewhere in the blog if you have not already seen.
Above can be seen the times used to generate the textures in software. 2048x2048 taking almost 1 minute! My code in this area isn't by any means heavily optimised, but is well written. Its a relatively simple algorithm of iterating though a list and comparing the height value against the region. Previously when procedurally generating a planet at run-time the texture size of choice was 256x256. This provided average detail but with the generation time of about 1 second, a freeze in movement was very obvious.
Now on to the better news....
What a difference? These times include the full process of using a the shader
Here is the fragment shader, which does all the work. The vertex shader just passes the vertex down the render pipeline.
struct vOutput
{
float4 color : COLOR;
};
struct TextureRegion
{
float2 startTextureCoord;
float2 endTextureCoord;
float optimalHeight;
float minHeight;
float maxHeight;
};
vOutput Main(float2 texCoord : TEXCOORD0,
uniform sampler2D heightMap : TEX0,
uniform sampler2D terrainTexture : TEX1,
uniform int terrainTextureRepeat,
uniform sampler2D detailTexture : TEX2,
uniform int detailTextureRepeat,
uniform float blendingRatio,
uniform TextureRegion regions[4])
{
vOutput OUT;
//Get the Height
float4 bytes = tex2D(heightMap, texCoord);
float height = ((bytes[0] * 16777216.0f) + (bytes[1] * 65536.0f) + (bytes[2] * 256.0f)) / 1000.0f;
//Strength of this Terrain Tile at this height
float strength = 0.0f;
//Color for this Pixel
OUT.color = float4(0, 0, 0, 1);
int colorset = 0;
//For Each Terrain Tile Defined
for (int loop = 0; loop < 4; loop++)
{
//If the Current Terrain Pixel Falls within this range
if (height > regions[loop].minHeight && regions[loop].maxHeight > height)
{
colorset = 1;
//Work out the % that applies to this height
//If Height = Optimal, then its 100% otherwise fade out relative to distance between optimal and min/max
if (height == regions[loop].optimalHeight)
{
strength = 1.0f;
}
else if (height > regions[loop].optimalHeight)
{
float temp1 = regions[loop].maxHeight - regions[loop].optimalHeight;
strength = ((temp1 - (height - regions[loop].optimalHeight)) / temp1);
}
else if (height < regions[loop].optimalHeight)
{
float temp1 = height - regions[loop].minHeight;
float temp2 = regions[loop].optimalHeight - regions[loop].minHeight;
strength = temp1 / temp2;
}
if (strength != 0.0f)
{
float2 tileTexCoord;
//Tile the Texture Coordinates
tileTexCoord[0] = fmod((texCoord[0] * terrainTextureRepeat), 1.0f);
tileTexCoord[1] = fmod((texCoord[1] * terrainTextureRepeat), 1.0f);
//Recalculate the Texture Coordinates so that they are within the Specified Tile
tileTexCoord = regions[loop].startTextureCoord + ((regions[loop].endTextureCoord - regions[loop].startTextureCoord) * tileTexCoord);
//Get the Color at this Terrain Coordinate
OUT.color += (tex2D(terrainTexture, tileTexCoord) * strength);
}
}
}
if (0.0f == colorset)
{
//Make Pink so that its obvious on the terrain (only for debugging)
OUT.color = float4(1, 0, 1, 1);
}
else
{
//Scale the Texture Coordinate for Repeating detail and get the Detail Map Color
texCoord *= detailTextureRepeat;
float4 detailColor = tex2D(detailTexture, texCoord);
//Interpolate Between the 2 Colors to get final Color
OUT.color = lerp(OUT.color, detailColor, blendingRatio);
}
return OUT;
}
This week I have been using this shader in 2 ways.
All the above results were generated on my laptop which has the following.
Renderer: ATI Mobility Radeon HD 3670
Vendor: ATI Technologies Inc.
Memory: 512 MB
Version: 3.0.8599 Forward-Compatible Context
Shading language version: 1.30
Max texture size: 8192 x 8192
Max texture coordinates: 16
Max vertex texture image units: 16
Max texture image units: 16
Max geometry texture units: 0
Max anisotropic filtering value: 16
Max number of light sources: 8
Max viewport size: 8192 x 8192
Max uniform vertex components: 512
Max uniform fragment components: 512
Max geometry uniform components: 0
Max varying floats: 68
Max samples: 8
Max draw buffers: 8
As always comments are welcome and appreciated.
It should be noted before reading any further that this is the first shader I have ever written. I've used and modified shaders before such as Sean O'Neil's atmospheric scattering (there is a post below somewhere) and some bumpmapping, but all code in this shader is mine and therefore possibly with some rookie mistakes.
Lets first refresh on the very simple texture generation technique currently implemented. The user specifies a list of terrain regions. Each region has texture data, an optimal, min and max height associated with it. For each pixel in the texture being generated the terrain height at that position is queried, interpolated if required (if the texture is higher resolution than the terrain mesh). This height is then compared to all terrain regions and a colour of the pixel is based on the strength of this height within the regions. There are many examples of the results of this algorithm elsewhere in the blog if you have not already seen.
Above can be seen the times used to generate the textures in software. 2048x2048 taking almost 1 minute! My code in this area isn't by any means heavily optimised, but is well written. Its a relatively simple algorithm of iterating though a list and comparing the height value against the region. Previously when procedurally generating a planet at run-time the texture size of choice was 256x256. This provided average detail but with the generation time of about 1 second, a freeze in movement was very obvious.
Now on to the better news....
What a difference? These times include the full process of using a the shader
- Binding the Frame buffer so that the texture can be rendered off screen,
- Enabling the Vertex and Fragment shader, binding the textures required.
- Rendering the texture
- Unbinding/disabling everything used during this sequence.
Here is the fragment shader, which does all the work. The vertex shader just passes the vertex down the render pipeline.
struct vOutput
{
float4 color : COLOR;
};
struct TextureRegion
{
float2 startTextureCoord;
float2 endTextureCoord;
float optimalHeight;
float minHeight;
float maxHeight;
};
vOutput Main(float2 texCoord : TEXCOORD0,
uniform sampler2D heightMap : TEX0,
uniform sampler2D terrainTexture : TEX1,
uniform int terrainTextureRepeat,
uniform sampler2D detailTexture : TEX2,
uniform int detailTextureRepeat,
uniform float blendingRatio,
uniform TextureRegion regions[4])
{
vOutput OUT;
//Get the Height
float4 bytes = tex2D(heightMap, texCoord);
float height = ((bytes[0] * 16777216.0f) + (bytes[1] * 65536.0f) + (bytes[2] * 256.0f)) / 1000.0f;
//Strength of this Terrain Tile at this height
float strength = 0.0f;
//Color for this Pixel
OUT.color = float4(0, 0, 0, 1);
int colorset = 0;
//For Each Terrain Tile Defined
for (int loop = 0; loop < 4; loop++)
{
//If the Current Terrain Pixel Falls within this range
if (height > regions[loop].minHeight && regions[loop].maxHeight > height)
{
colorset = 1;
//Work out the % that applies to this height
//If Height = Optimal, then its 100% otherwise fade out relative to distance between optimal and min/max
if (height == regions[loop].optimalHeight)
{
strength = 1.0f;
}
else if (height > regions[loop].optimalHeight)
{
float temp1 = regions[loop].maxHeight - regions[loop].optimalHeight;
strength = ((temp1 - (height - regions[loop].optimalHeight)) / temp1);
}
else if (height < regions[loop].optimalHeight)
{
float temp1 = height - regions[loop].minHeight;
float temp2 = regions[loop].optimalHeight - regions[loop].minHeight;
strength = temp1 / temp2;
}
if (strength != 0.0f)
{
float2 tileTexCoord;
//Tile the Texture Coordinates
tileTexCoord[0] = fmod((texCoord[0] * terrainTextureRepeat), 1.0f);
tileTexCoord[1] = fmod((texCoord[1] * terrainTextureRepeat), 1.0f);
//Recalculate the Texture Coordinates so that they are within the Specified Tile
tileTexCoord = regions[loop].startTextureCoord + ((regions[loop].endTextureCoord - regions[loop].startTextureCoord) * tileTexCoord);
//Get the Color at this Terrain Coordinate
OUT.color += (tex2D(terrainTexture, tileTexCoord) * strength);
}
}
}
if (0.0f == colorset)
{
//Make Pink so that its obvious on the terrain (only for debugging)
OUT.color = float4(1, 0, 1, 1);
}
else
{
//Scale the Texture Coordinate for Repeating detail and get the Detail Map Color
texCoord *= detailTextureRepeat;
float4 detailColor = tex2D(detailTexture, texCoord);
//Interpolate Between the 2 Colors to get final Color
OUT.color = lerp(OUT.color, detailColor, blendingRatio);
}
return OUT;
}
This week I have been using this shader in 2 ways.
- Use as described above, to generate a texture once per terrain patch (will get generated in higher detail when the patch subdivides) and this texture is then used when rendering.
- Use and bind every frame which gives per-pixel texture generation. This has the obvious disadvantage of requiring that the texture data for the terrain is generated each frame, but obviously does so for only the onscreen terrain. It has the nice advantage of not taking up any graphics memory, no need for frame buffers, rendering off screen, etc.... I was getting between 200 and 600 fps using this method.
All the above results were generated on my laptop which has the following.
Renderer: ATI Mobility Radeon HD 3670
Vendor: ATI Technologies Inc.
Memory: 512 MB
Version: 3.0.8599 Forward-Compatible Context
Shading language version: 1.30
Max texture size: 8192 x 8192
Max texture coordinates: 16
Max vertex texture image units: 16
Max texture image units: 16
Max geometry texture units: 0
Max anisotropic filtering value: 16
Max number of light sources: 8
Max viewport size: 8192 x 8192
Max uniform vertex components: 512
Max uniform fragment components: 512
Max geometry uniform components: 0
Max varying floats: 68
Max samples: 8
Max draw buffers: 8
As always comments are welcome and appreciated.
Thursday, November 19, 2009
Higher Detail Heightmap Textures
When originally creating height maps, most of the tutorials store this information in grey-scale, therefore at the time this is what was implemented in decade. Limited to 256 different heights (as only 1 byte is used per value) it may be acceptable for demo's with small terrain patches, but is inadequate for anything larger with a more realistic topography.
To overcome this a terrain file format was created for decade. This allowed the saving of height data to a binary or text file using multiple bytes per value. With the introduction of shaders into Decade terrain engine, this too has become inadequate. I need to send the height information to the shader, along with the source tiles so that the procedural terrain texture can be created. The only feasible way to send this height information to the Graphics Card is in a texture, but the grey-scale implementation did not have high enough detail.
Solution? Combine the texture implementation with the multi-byte file format. To do this I split the floating point height position across the 3 color bytes using some simple bit shifting.
Using 3 bytes it is possible to represent 16777216 unique values (256x256x256). In the following example I want to maintain 3 digits after the decimal separator. This allows me to have terrain heights from 0.000 to 16777.216 which should be suitable for most procedural planets. It is of course possible to make the number of decimal digits configurable.
To convert a floating point height into 3 bytes (used in the texture rgb).
//Get the Height at the current Terrain Position
float l_fHeight = get_HeightAtPoint(l_iXPosition, l_iYPosition);
//Convert to an int. Multiply by 1000 to keep 3 decmial places.
int l_iColor = (int)(l_fHeight * 1000);
//Seperate into 3 Bytes
l_ucRed = ((l_iHeight >> 16) & 0xFFFF);
l_ucGreen = ((l_iHeight >> 8) & 0xFF);
l_ucBlue = l_iHeight & 0xFF;
and to convert 3 bytes back into a float, with 3 decimal places is as easy as
l_fHeight = ((l_ucRed << 16) | (l_ucGreen << 8) | l_ucBlue) / 1000.0f;
Above is a sample showing grey-scale height maps and their equivalent 24bit height maps. When I first saw the new maps, I thought there was a bug, and there would be rough edges in the terrain, due to the sudden color changes within the texture, however it loads correctly and terrain detail is maintained.
To overcome this a terrain file format was created for decade. This allowed the saving of height data to a binary or text file using multiple bytes per value. With the introduction of shaders into Decade terrain engine, this too has become inadequate. I need to send the height information to the shader, along with the source tiles so that the procedural terrain texture can be created. The only feasible way to send this height information to the Graphics Card is in a texture, but the grey-scale implementation did not have high enough detail.
Solution? Combine the texture implementation with the multi-byte file format. To do this I split the floating point height position across the 3 color bytes using some simple bit shifting.
Using 3 bytes it is possible to represent 16777216 unique values (256x256x256). In the following example I want to maintain 3 digits after the decimal separator. This allows me to have terrain heights from 0.000 to 16777.216 which should be suitable for most procedural planets. It is of course possible to make the number of decimal digits configurable.
To convert a floating point height into 3 bytes (used in the texture rgb).
//Get the Height at the current Terrain Position
float l_fHeight = get_HeightAtPoint(l_iXPosition, l_iYPosition);
//Convert to an int. Multiply by 1000 to keep 3 decmial places.
int l_iColor = (int)(l_fHeight * 1000);
//Seperate into 3 Bytes
l_ucRed = ((l_iHeight >> 16) & 0xFFFF);
l_ucGreen = ((l_iHeight >> 8) & 0xFF);
l_ucBlue = l_iHeight & 0xFF;
and to convert 3 bytes back into a float, with 3 decimal places is as easy as
l_fHeight = ((l_ucRed << 16) | (l_ucGreen << 8) | l_ucBlue) / 1000.0f;
Above is a sample showing grey-scale height maps and their equivalent 24bit height maps. When I first saw the new maps, I thought there was a bug, and there would be rough edges in the terrain, due to the sudden color changes within the texture, however it loads correctly and terrain detail is maintained.
Tuesday, November 10, 2009
Winter is returning and I'm back again.
Yet again the summer months have been a slack time for Decade. As a solo hobby project I find it very difficult to sit at my PC after a long day in the office when the sun is shining outside. Now that the cold, wet and dark nights are back I feel the urge to return to Decade and complete some of my long standing wishlist features.
Over the past week or two I've been researching shaders. I have decided to add support for CG and CGFX to Decade. My first task with this will be to move the procedural texture generation to the GPU. This should hopefully vastly speed up this area of the engine allowing much smoother planet generation.
Having reviewed the Decade code with a fresh mind, some housekeeping is first required. Cleaning up interfaces and improving segments before building upon. Within the next week I hope to have some comparisons between generating the textures on the CPU and the GPU. Not having much experience using the GPU for this type of processing I am unsure what to expect but from reading other blogs regarding Planet and Terrain generation I am confident that it is the right approach to take.
Over the past week or two I've been researching shaders. I have decided to add support for CG and CGFX to Decade. My first task with this will be to move the procedural texture generation to the GPU. This should hopefully vastly speed up this area of the engine allowing much smoother planet generation.
Having reviewed the Decade code with a fresh mind, some housekeeping is first required. Cleaning up interfaces and improving segments before building upon. Within the next week I hope to have some comparisons between generating the textures on the CPU and the GPU. Not having much experience using the GPU for this type of processing I am unsure what to expect but from reading other blogs regarding Planet and Terrain generation I am confident that it is the right approach to take.
Monday, February 23, 2009
Basic Terrain Editor
This update in itself probably does not deserve a full blog update however its been too long since I have reported any Engine Progress and some nice features have been added.
The video below shows my sandbox terrain editor. This is running in Engine Real time. By changing the size of the target area and using the mouse wheel it is possible to raise or lower segments of terrain. As the terrain changes the terrain texture is recalculated (for modified areas).
Key features:
A GUI system is also in development. This system is based on my experience of C#. It is possible to register events for each GUI component on the engine or script level. These events are fired under the specified circumstances. e.g. Mouse Enters, Mouse Leaves, MouseLeftClicked etc...
Version 2 of the Terrain Editor should make use of this GUI system and also support features such as
The video below shows my sandbox terrain editor. This is running in Engine Real time. By changing the size of the target area and using the mouse wheel it is possible to raise or lower segments of terrain. As the terrain changes the terrain texture is recalculated (for modified areas).
Key features:
- Dynamic Vertex Buffer Object updating when terrain is updated
- Dynamic Texture Updating when terrain is updated (from a set of source textures and height values (Optimal, Min, Max)
- Screen to World projection allowing Decade to know where in the world the user is selecting with the mouse.
- High level functionality of demo is 100% scripted (which calls Engine functions bound to script engine)
A GUI system is also in development. This system is based on my experience of C#. It is possible to register events for each GUI component on the engine or script level. These events are fired under the specified circumstances. e.g. Mouse Enters, Mouse Leaves, MouseLeftClicked etc...
Version 2 of the Terrain Editor should make use of this GUI system and also support features such as
- Adding areas of Water
- Adding areas of vegetation
- Erosion Filters on selected area or whole terrain
- Texture splashing for better terrain details. (Roads, coast line etc...)
Subscribe to:
Posts (Atom)