среда, 23 декабря 2009 г.

Physics programming

Hi, people!!! Merry Christmas and happy New Year. I'm G@rdster and I want to present some information about Storm engine development.
Оne of the most interested features we develop is the megaterrain feature. We are trying to create vast landscapes in the game world. So we can't store all information about game world in fast memory simultaneously and each period of time game engine needs to decide what information it really needs. For this purpose we use area of interest conception. It means that we have some small area around any point of interest (near camera or near player's character) with all data we have on it loading with highest detalization level and with all calculations on it. On the border of this area we have tape with low detalised and loading data on it. When we move our point of interest this area moves with it and engine begins load and unload border data.
So what it means for our vast terrain. It means, that we need to divide all terrain we have to the small peaces, named "tiles".
Here is one "under the eye" track of physics tiles:

Each tile corresponds independent height field actor. It puts some limitations to the simulation process. First of all we need to freeze all dynamic objects when we unload terrain under it and unfreeze (add to simulation list) it only when we create terrain and all physics environment.
Also we need to load several tiles around the interesting point to create
Physics area of interest process:
1. Load all data under the point of interest:

2. Move our point of interest


среда, 16 декабря 2009 г.

Monthly report

Currently have no time for full featured post as it's getting very hot in the university. But I have a lot to tell about new improvements in material system and shadows so these are gonna be the ones covered by the end of this year. Here's a bunch of fresh screen shots of WIP material system & glass render. ( I've turned off ray tracing as it currently slows everything down a lot. I'll combine all the eye candies together when I get some time to maintain my code and do some optimization work )



There are more pics in the Gallery.

понедельник, 23 ноября 2009 г.

Hybrids Of Steel

This post is actually gonna be dedicated to reflective lighting term computation. I'd like to talk a bit about specular reflections. A number of approaches to solve the problem of reflections have been developed and all of them are based on 2 concepts. First of them works only with direct lighting term considering light to shade each point directly, with no contribution form other points forming objects of the scene. In this case we have pretty simple (and certainly fast) equations that are used to determine lighting term. But we have no ability to produce reflection because we don't have any information about surrounding geometry for every reflective point of the world. A well-known approach of cubemap (or environment map) reflections is widely used in this case. I'm pretty sure that everyone is familiar with the idea and I won't spend much time and screen space
on it, so here are some basic ideas:
  • Cube textures are used to store information about surrounding objects per scene object
  • Or may be even per a number of objects
  • Allows fast and realistic reflections to be obtained
  • Allows HDR on modern hardware
  • A cubemap may be created in 1 pass on DX10+ HW ( 6+ MRTs and GS cloning are available)
This technique is straightforward, fast and used almost everywhere. It's main drawback is pretty obvious - it doesn't allow us to produce self reflections on objects as all the data from the cubemap is used on per-object basis. That means that we have all the surrounding objects in the cubemap except the reflective object itself and thus we're not able to fetch it's reflection from the cubemap. This leads to loss of realism of the scene since missing details can be easily noticed, especially when user knows where to look for them :) Such an absence of reflections can be easily noticed on vehicles as they have lots of overhanging parts such as rear view mirrors or aerodynamic spoilers or whatever else can be attached to a vehicle. Such failures IMO are more noticeable then, for example, SSAO so it's a good point to look around for a way of solving them.
And here the second concept comes into the game. The Ray tracing. It's natural to talk a lot here as well but I'll discuss points that are most valuable in my opinion.
  • As we trace a ray through the scene we can collect contribution of multiple objects to final color
  • We are actually able to balance speed/quality by setting a limit for the number of bounces
  • We have to find an efficient way to store scene data since it can require very large amount of memory
  • Since ray direction after each bounce is generally unpredictable we're gonna have some trouble with cache locality
So the idea is pretty common once again. We reflect a ray from the reflective object's surface just like in the case of cubemap reflection. But instead of just fetching a color from a cubemap using obtained ray direction, we march along the ray direction looking for an intersection with any geometry. When intersection is found, we're reflect once again and so on. After each reflection we also take into consideration diffuse color of the surface we're reflecting from to a final color.
Okay, thats seems rather easy even comped to 'classical' cubemap approach. The only problem unsolved is how are we actually supposed to store all those colors and normals for all the objects of the scene? Well, cubemaps seem to be a good choice. We just have to store multiple cubemaps to simulate 'depth' of the scene relatively to it's center, which is, at this point, our reflective object. Yeah, my new shiny car is a center of the Universe! :) Ok, whatever. To be able to store back facing polygons of scene objects (they are also reflecting our light rays!) we have to store 4 cubemaps per slice - 2 for normals ( one for front facing geometry and one for back facing ) and 2 for albedos ( yep, once again one per facing sign ) . But how do we determine what is a 'slice'? In general case we need some kind of depth peeling algorithm. The problem of depth peeling is that we don't now in general case how many iterations it will take us to peel all the layers. It's critical for out ray traced reflections idea, because each peeled layer takes 2 cubemaps  and it's easy to run out of memory budget. But it turns out that we don't need that general case. The aim of our work is to get object's self reflections. These can be obtained by tracing a ray against the object's geometry. That means that we need only 2 layers for self reflections (front faces + back faces) plus one more layer to store all the rest of the scene just like if we were making that 'classical' cubemap reflections. We don't even need to perform honestly that depth peeling algorithm as everything is already determined. So the brief summary:

  • Create 6 cubemaps
  • Render back facing polygons of reflective object to first pair of them storing color and normal
  • Render front facing polygons of reflective object to second pair of them storing color and normal
  • Render front facing polygons of the rest scene to the last pair of them storing color and normal
  • You now have everything you need to race rays reflected from reflective object to gather reflection
  • The last one seem pretty weird-written to me but I hope you've got it right :)
Pros:
  • All the data is stored in a GPU-friendly and easy to access form
  • Using fixed maximum number of iterations and dynamic branching of modern GPUs we're able to do the tracing in real time maintaining reasonable FPS
  • That won't be so hard to integrate such a technique to an existing rendering pipeline as it's based on popular and well known conceps
  • We can improve performance reducing size of the cubemaps
Cons:
  • We still need large amount of RAM to store all our cubemaps
  • We sacrifice quality to simplify algorithm. For correct and detailed multiple reflections we have to implement depth peeling 
  • The cache locality problem is not solved. New ray's direction after reflection becomes even more random after 2nd or 3rd order reflections
  • It takes a lot of time to prepare those 6 cubemaps for one reflective object so we need to optimize this somehow. Spreading the work into several frames may cause reflections to display in fits and starts and/or increase required amount of RAM if one is going to fix it using second (for each of the 6 of course) cubemap and lrp instruction in the shader. So this is a problem to think about. 
Well, I hope I haven't missed something really important and I'm pretty sure that I've missed something. So if you will point it out in a comment I'll try to give an answer or add a note in the post. Anyway the idea of using ray tracing for reflection computation should be clear now.
The diffuse term can still be obtained in the 'default' way, that means no rays, no tracers, but only good old Dot(N,L). Ray traced reflection can be then added using Fresnel's term as an 'intensity' coefficient. I personally do lerp Diffuse + Specular with Reflection. That seems to be rather nice.

As a reference, I'd like to advice an article named "Robust multiple specular reflections and refractions" by Tamás Umenhoffer, Gustavo Patow and László Szirmay-Kalos. You can find it in nVidia's GPU Gems 3, online version is available here. The article explains the idea in a strict form, provides necessary formulas, code snippets and optimization hints so I consider it to be very useful and a good place to start. All my work on multiple reflections is based on this article and I'm a bit disappointed that I'm not the one who came up with the idea first :)

Here are some results of my WIP implementation:





As I promised earlier, here's a link to The Gallery where you can fin more screenshots of the engine and it's applications.

пятница, 9 октября 2009 г.

Storm ReBorn

So a year and a half passed since the last post here and the blog seems pretty like dead. I've decided to resurrect this one instead of creating new one. All the previous posts made by my friend and co-developer G@rd will be left intact as part of the history. I personally prefer to write in English 'cause it helps me a lot to improve my English. This will also allow foreign visitors to read the log. I hope that this attempt to record our development process will not fail, a least as fast as previous one :)

Okay, enough introduction. Here's what we've done during this year of silence. First of all, we've destroyed completely our previous work :). The reason of such a genocide is pretty usual. We found the architecture of our engine lame and unusable. Our first approach was based on the integration concept. This means that we had a monolith heavy core with a game client attached to it. The main pros of this approach are
  • easy integration (all the parts of the engine are already put together by design)
  • fast interaction (no LoadModule, everything is static)
The main cons, however, came out as
  • extremely long build time (up to 10 minutes on Core2Duo with 2.0 GHz freq, as we had to recompile lots of facilities that depended on the modified one)
  • difficulties with tool programming ( we had to rip chunks off the engine to attach some features to various tools or to integrate some tools within engine)
May be, all the cons were caused by our lame design or bad implementation. This does not really matter now. Something just had to be changed so we've started from scratch.

First of all, the terminology. Lets consider 'the module' to be an entity that solves one problem domain such as, for example, object rendering, and encapsulates all required feature subset, for example shader support, providing an interface to control it's actions, for example ::Draw(IObject*);

The new design just had to solve issues of the previous one possibly without adding new problems. It was natural to split monolith core to a set of smaller modules placed in separate binaries. That binaries should be preferably separated, that means that we have to build them as DLLs, not LIBs linked into the EXE. That would allow us to patch rather large engine parts with no need to recompile everything. We can build some DLLs and send to each other as patches so all the team mates do not have to deal with code, SDKs and compilation of parts of the engine they do not develop. The drawback of this approach is linkage. We found reasonable to use MS VC's __declspec(dllexport) modifier to export some symbols to LIBs associated with final DLLs. Such a modifier placed on a public function causes compiler to create a LIB file with the DLL (which is a primary build target). As a result, we can link to the DLL through supplied LIB. This seems to be a lot faster then true dynamic library loading.

The core was split up into a number of modules. The first, and the most low level if Foundation. It's aim is to provide proxy between our engine and OS. The engine should not know about OS specific APIs, synchronization primitives etc. This abstraction layer provides us with platform independency. Furthermore, the Foundation contains some basic mechanics required by both engine and tools, such as String representation classes, configuration manager, optimized memory allocator
, I/O, math, SIMD optimizations, etc. Such approach allows us to build engine tools on the same basement as the engine that seems to be rather convenient. The Foundation is to be used as a base for each of higher level engine modules as well.

The next layer contains the Console module. It is to be built upon the Foundation module. The Console provides debugging IO and basic scripting integration via callbacks. It doesn't know about scripting module implementation, but is able to send input data to it and accept results.

The Networking and Scripting modules are built at the same abstraction layer as the Console and both use the Foundation. Together the Console, the Networking module and the Scripting module provide basic functionality such as
  • developer commands and stats
  • remote debugging
  • action automation via scripts
  • script debugging
This set seems to be enough for comfortable work with the engine.

The next layer consist of the Renderer, the Physics simulator and the Sound manager. These modules application is pretty straight-forward and their internals are not the subject of this post. The only thing that matters now is that these guys are to draw something, move/bounce/collide something and to make some noise respectively.

The last but not least, and the most hight level is the Scene. It is supposed to provide high-level set of functions to create, manage, destroy in-game objects. It uses all the layers below to simulate, draw, sound and synchronize via net all the game world objects. It allows debugging through the console and scripting, too.

The other modules are to be built
are the GUI module and the AI module. These are planned, but are not heavily developed right now as we are working on low level routines to do our best in optimization and interface improvements.

Summarizing all the above I should say that new architecture approach seems to be logical, straight and hierarchical. It allows easy building and upgrading binaries. It provides unified basement for both the engine and the tool set. All the modules encapsulate everything related to their problem domain and provide interfaces for interaction.

I'll try to give more details on particular implementations later as I don't want to make the 'new first post' too heavy.

Before the conclusion I'll say a few words about target platforms and hardware requirements. Out first engine was targeted to PC platform with MS Windows XP and DirectX 9.0c compatible GPU. We've set a shader model 3 support as a requirement. With the new engine we target SM 4-5 GPUs installed on PCs under control of Windows 7. This version of Windows seems to be very promising from technical side and beloved by the community. There's no doubt that it will be much more popular then Vista was, so lots of users will have it on their PCs. This means that
users will have DirectX11 (shipped with 7). DirectX 11 will also be available on Vista, so the audience that already has Vista and is not going to upgrade in near future will be also able to run our games.
So in summary, we are now
  • using DirectX11 as primary API for graphics
  • targeting PCs with Windows Vista and 7
  • dropping XP support (at least temporary until the engine grows to alpha)
The new architecture design allows us to continue support of Windows XP with D3D9 as primary GAPI absolutely painlessly, but we have not enough resources to develop this branch. This means that it's not dropped completely, but frozen for a long time. We have some doubts on XP as a gaming platform by the time we finish our development, so there's no warranty that this branch will be ever finished.

From the other side, D3D11 as primary GAPI provides us with an ability to use new 'feature level' hummm.... feature of the API. It removes all the nasty D3DCAPS.. flags and provide strict sets of features required for any target ( 9_1, 9_2, 9_3, 10_0, 10_1, 11 ). We are now able to use one single GAPI for all devices under Windows Vista + OS and all the unsupported features will be disabled automatically. This means that technically we are able to continue support of D3D9 SM 3 level devices.

In conclusion (finally) I'd like to say that the project is no dead and it's reconstruction form scratch allowed us to add lots of new features, to make it much more comfortable to use and
to speed up development. Currently our re-developed engine is in early pre-alpha stage but already has some highly improved main killer-features of it's previous generation among of new ones, such as
  • geometry clip map terrain (up to 5 times faster implementation and up to 4 times less memory consumption)
  • deferred shading render engine ( up to 8 times faster, thanks to wise state management and insane optimizations )
  • completely new resource management system with multi threaded loading (intensively uses multicore CPUs for resource cooking and management)
  • completely new virtual file system with support of LZMA compression
  • much, much more to come...
In near future we plan to spread our scene graph into several threads to speed up culling and animation, improve object management, add some new eye candies to the renderer.
The video and screen shots gallery will appear a bit later when we complete tests of our current feature set.