среда, 23 декабря 2009 г.

Physics programming

Hi, people!!! Merry Christmas and happy New Year. I'm G@rdster and I want to present some information about Storm engine development.
Оne of the most interested features we develop is the megaterrain feature. We are trying to create vast landscapes in the game world. So we can't store all information about game world in fast memory simultaneously and each period of time game engine needs to decide what information it really needs. For this purpose we use area of interest conception. It means that we have some small area around any point of interest (near camera or near player's character) with all data we have on it loading with highest detalization level and with all calculations on it. On the border of this area we have tape with low detalised and loading data on it. When we move our point of interest this area moves with it and engine begins load and unload border data.
So what it means for our vast terrain. It means, that we need to divide all terrain we have to the small peaces, named "tiles".
Here is one "under the eye" track of physics tiles:

Each tile corresponds independent height field actor. It puts some limitations to the simulation process. First of all we need to freeze all dynamic objects when we unload terrain under it and unfreeze (add to simulation list) it only when we create terrain and all physics environment.
Also we need to load several tiles around the interesting point to create
Physics area of interest process:
1. Load all data under the point of interest:

2. Move our point of interest


среда, 16 декабря 2009 г.

Monthly report

Currently have no time for full featured post as it's getting very hot in the university. But I have a lot to tell about new improvements in material system and shadows so these are gonna be the ones covered by the end of this year. Here's a bunch of fresh screen shots of WIP material system & glass render. ( I've turned off ray tracing as it currently slows everything down a lot. I'll combine all the eye candies together when I get some time to maintain my code and do some optimization work )



There are more pics in the Gallery.

понедельник, 23 ноября 2009 г.

Hybrids Of Steel

This post is actually gonna be dedicated to reflective lighting term computation. I'd like to talk a bit about specular reflections. A number of approaches to solve the problem of reflections have been developed and all of them are based on 2 concepts. First of them works only with direct lighting term considering light to shade each point directly, with no contribution form other points forming objects of the scene. In this case we have pretty simple (and certainly fast) equations that are used to determine lighting term. But we have no ability to produce reflection because we don't have any information about surrounding geometry for every reflective point of the world. A well-known approach of cubemap (or environment map) reflections is widely used in this case. I'm pretty sure that everyone is familiar with the idea and I won't spend much time and screen space
on it, so here are some basic ideas:
  • Cube textures are used to store information about surrounding objects per scene object
  • Or may be even per a number of objects
  • Allows fast and realistic reflections to be obtained
  • Allows HDR on modern hardware
  • A cubemap may be created in 1 pass on DX10+ HW ( 6+ MRTs and GS cloning are available)
This technique is straightforward, fast and used almost everywhere. It's main drawback is pretty obvious - it doesn't allow us to produce self reflections on objects as all the data from the cubemap is used on per-object basis. That means that we have all the surrounding objects in the cubemap except the reflective object itself and thus we're not able to fetch it's reflection from the cubemap. This leads to loss of realism of the scene since missing details can be easily noticed, especially when user knows where to look for them :) Such an absence of reflections can be easily noticed on vehicles as they have lots of overhanging parts such as rear view mirrors or aerodynamic spoilers or whatever else can be attached to a vehicle. Such failures IMO are more noticeable then, for example, SSAO so it's a good point to look around for a way of solving them.
And here the second concept comes into the game. The Ray tracing. It's natural to talk a lot here as well but I'll discuss points that are most valuable in my opinion.
  • As we trace a ray through the scene we can collect contribution of multiple objects to final color
  • We are actually able to balance speed/quality by setting a limit for the number of bounces
  • We have to find an efficient way to store scene data since it can require very large amount of memory
  • Since ray direction after each bounce is generally unpredictable we're gonna have some trouble with cache locality
So the idea is pretty common once again. We reflect a ray from the reflective object's surface just like in the case of cubemap reflection. But instead of just fetching a color from a cubemap using obtained ray direction, we march along the ray direction looking for an intersection with any geometry. When intersection is found, we're reflect once again and so on. After each reflection we also take into consideration diffuse color of the surface we're reflecting from to a final color.
Okay, thats seems rather easy even comped to 'classical' cubemap approach. The only problem unsolved is how are we actually supposed to store all those colors and normals for all the objects of the scene? Well, cubemaps seem to be a good choice. We just have to store multiple cubemaps to simulate 'depth' of the scene relatively to it's center, which is, at this point, our reflective object. Yeah, my new shiny car is a center of the Universe! :) Ok, whatever. To be able to store back facing polygons of scene objects (they are also reflecting our light rays!) we have to store 4 cubemaps per slice - 2 for normals ( one for front facing geometry and one for back facing ) and 2 for albedos ( yep, once again one per facing sign ) . But how do we determine what is a 'slice'? In general case we need some kind of depth peeling algorithm. The problem of depth peeling is that we don't now in general case how many iterations it will take us to peel all the layers. It's critical for out ray traced reflections idea, because each peeled layer takes 2 cubemaps  and it's easy to run out of memory budget. But it turns out that we don't need that general case. The aim of our work is to get object's self reflections. These can be obtained by tracing a ray against the object's geometry. That means that we need only 2 layers for self reflections (front faces + back faces) plus one more layer to store all the rest of the scene just like if we were making that 'classical' cubemap reflections. We don't even need to perform honestly that depth peeling algorithm as everything is already determined. So the brief summary:

  • Create 6 cubemaps
  • Render back facing polygons of reflective object to first pair of them storing color and normal
  • Render front facing polygons of reflective object to second pair of them storing color and normal
  • Render front facing polygons of the rest scene to the last pair of them storing color and normal
  • You now have everything you need to race rays reflected from reflective object to gather reflection
  • The last one seem pretty weird-written to me but I hope you've got it right :)
Pros:
  • All the data is stored in a GPU-friendly and easy to access form
  • Using fixed maximum number of iterations and dynamic branching of modern GPUs we're able to do the tracing in real time maintaining reasonable FPS
  • That won't be so hard to integrate such a technique to an existing rendering pipeline as it's based on popular and well known conceps
  • We can improve performance reducing size of the cubemaps
Cons:
  • We still need large amount of RAM to store all our cubemaps
  • We sacrifice quality to simplify algorithm. For correct and detailed multiple reflections we have to implement depth peeling 
  • The cache locality problem is not solved. New ray's direction after reflection becomes even more random after 2nd or 3rd order reflections
  • It takes a lot of time to prepare those 6 cubemaps for one reflective object so we need to optimize this somehow. Spreading the work into several frames may cause reflections to display in fits and starts and/or increase required amount of RAM if one is going to fix it using second (for each of the 6 of course) cubemap and lrp instruction in the shader. So this is a problem to think about. 
Well, I hope I haven't missed something really important and I'm pretty sure that I've missed something. So if you will point it out in a comment I'll try to give an answer or add a note in the post. Anyway the idea of using ray tracing for reflection computation should be clear now.
The diffuse term can still be obtained in the 'default' way, that means no rays, no tracers, but only good old Dot(N,L). Ray traced reflection can be then added using Fresnel's term as an 'intensity' coefficient. I personally do lerp Diffuse + Specular with Reflection. That seems to be rather nice.

As a reference, I'd like to advice an article named "Robust multiple specular reflections and refractions" by Tamás Umenhoffer, Gustavo Patow and László Szirmay-Kalos. You can find it in nVidia's GPU Gems 3, online version is available here. The article explains the idea in a strict form, provides necessary formulas, code snippets and optimization hints so I consider it to be very useful and a good place to start. All my work on multiple reflections is based on this article and I'm a bit disappointed that I'm not the one who came up with the idea first :)

Here are some results of my WIP implementation:





As I promised earlier, here's a link to The Gallery where you can fin more screenshots of the engine and it's applications.

пятница, 9 октября 2009 г.

Storm ReBorn

So a year and a half passed since the last post here and the blog seems pretty like dead. I've decided to resurrect this one instead of creating new one. All the previous posts made by my friend and co-developer G@rd will be left intact as part of the history. I personally prefer to write in English 'cause it helps me a lot to improve my English. This will also allow foreign visitors to read the log. I hope that this attempt to record our development process will not fail, a least as fast as previous one :)

Okay, enough introduction. Here's what we've done during this year of silence. First of all, we've destroyed completely our previous work :). The reason of such a genocide is pretty usual. We found the architecture of our engine lame and unusable. Our first approach was based on the integration concept. This means that we had a monolith heavy core with a game client attached to it. The main pros of this approach are
  • easy integration (all the parts of the engine are already put together by design)
  • fast interaction (no LoadModule, everything is static)
The main cons, however, came out as
  • extremely long build time (up to 10 minutes on Core2Duo with 2.0 GHz freq, as we had to recompile lots of facilities that depended on the modified one)
  • difficulties with tool programming ( we had to rip chunks off the engine to attach some features to various tools or to integrate some tools within engine)
May be, all the cons were caused by our lame design or bad implementation. This does not really matter now. Something just had to be changed so we've started from scratch.

First of all, the terminology. Lets consider 'the module' to be an entity that solves one problem domain such as, for example, object rendering, and encapsulates all required feature subset, for example shader support, providing an interface to control it's actions, for example ::Draw(IObject*);

The new design just had to solve issues of the previous one possibly without adding new problems. It was natural to split monolith core to a set of smaller modules placed in separate binaries. That binaries should be preferably separated, that means that we have to build them as DLLs, not LIBs linked into the EXE. That would allow us to patch rather large engine parts with no need to recompile everything. We can build some DLLs and send to each other as patches so all the team mates do not have to deal with code, SDKs and compilation of parts of the engine they do not develop. The drawback of this approach is linkage. We found reasonable to use MS VC's __declspec(dllexport) modifier to export some symbols to LIBs associated with final DLLs. Such a modifier placed on a public function causes compiler to create a LIB file with the DLL (which is a primary build target). As a result, we can link to the DLL through supplied LIB. This seems to be a lot faster then true dynamic library loading.

The core was split up into a number of modules. The first, and the most low level if Foundation. It's aim is to provide proxy between our engine and OS. The engine should not know about OS specific APIs, synchronization primitives etc. This abstraction layer provides us with platform independency. Furthermore, the Foundation contains some basic mechanics required by both engine and tools, such as String representation classes, configuration manager, optimized memory allocator
, I/O, math, SIMD optimizations, etc. Such approach allows us to build engine tools on the same basement as the engine that seems to be rather convenient. The Foundation is to be used as a base for each of higher level engine modules as well.

The next layer contains the Console module. It is to be built upon the Foundation module. The Console provides debugging IO and basic scripting integration via callbacks. It doesn't know about scripting module implementation, but is able to send input data to it and accept results.

The Networking and Scripting modules are built at the same abstraction layer as the Console and both use the Foundation. Together the Console, the Networking module and the Scripting module provide basic functionality such as
  • developer commands and stats
  • remote debugging
  • action automation via scripts
  • script debugging
This set seems to be enough for comfortable work with the engine.

The next layer consist of the Renderer, the Physics simulator and the Sound manager. These modules application is pretty straight-forward and their internals are not the subject of this post. The only thing that matters now is that these guys are to draw something, move/bounce/collide something and to make some noise respectively.

The last but not least, and the most hight level is the Scene. It is supposed to provide high-level set of functions to create, manage, destroy in-game objects. It uses all the layers below to simulate, draw, sound and synchronize via net all the game world objects. It allows debugging through the console and scripting, too.

The other modules are to be built
are the GUI module and the AI module. These are planned, but are not heavily developed right now as we are working on low level routines to do our best in optimization and interface improvements.

Summarizing all the above I should say that new architecture approach seems to be logical, straight and hierarchical. It allows easy building and upgrading binaries. It provides unified basement for both the engine and the tool set. All the modules encapsulate everything related to their problem domain and provide interfaces for interaction.

I'll try to give more details on particular implementations later as I don't want to make the 'new first post' too heavy.

Before the conclusion I'll say a few words about target platforms and hardware requirements. Out first engine was targeted to PC platform with MS Windows XP and DirectX 9.0c compatible GPU. We've set a shader model 3 support as a requirement. With the new engine we target SM 4-5 GPUs installed on PCs under control of Windows 7. This version of Windows seems to be very promising from technical side and beloved by the community. There's no doubt that it will be much more popular then Vista was, so lots of users will have it on their PCs. This means that
users will have DirectX11 (shipped with 7). DirectX 11 will also be available on Vista, so the audience that already has Vista and is not going to upgrade in near future will be also able to run our games.
So in summary, we are now
  • using DirectX11 as primary API for graphics
  • targeting PCs with Windows Vista and 7
  • dropping XP support (at least temporary until the engine grows to alpha)
The new architecture design allows us to continue support of Windows XP with D3D9 as primary GAPI absolutely painlessly, but we have not enough resources to develop this branch. This means that it's not dropped completely, but frozen for a long time. We have some doubts on XP as a gaming platform by the time we finish our development, so there's no warranty that this branch will be ever finished.

From the other side, D3D11 as primary GAPI provides us with an ability to use new 'feature level' hummm.... feature of the API. It removes all the nasty D3DCAPS.. flags and provide strict sets of features required for any target ( 9_1, 9_2, 9_3, 10_0, 10_1, 11 ). We are now able to use one single GAPI for all devices under Windows Vista + OS and all the unsupported features will be disabled automatically. This means that technically we are able to continue support of D3D9 SM 3 level devices.

In conclusion (finally) I'd like to say that the project is no dead and it's reconstruction form scratch allowed us to add lots of new features, to make it much more comfortable to use and
to speed up development. Currently our re-developed engine is in early pre-alpha stage but already has some highly improved main killer-features of it's previous generation among of new ones, such as
  • geometry clip map terrain (up to 5 times faster implementation and up to 4 times less memory consumption)
  • deferred shading render engine ( up to 8 times faster, thanks to wise state management and insane optimizations )
  • completely new resource management system with multi threaded loading (intensively uses multicore CPUs for resource cooking and management)
  • completely new virtual file system with support of LZMA compression
  • much, much more to come...
In near future we plan to spread our scene graph into several threads to speed up culling and animation, improve object management, add some new eye candies to the renderer.
The video and screen shots gallery will appear a bit later when we complete tests of our current feature set.

суббота, 15 марта 2008 г.

Последние новости

С началом нового семестра время потекло быстрее). За прошедшые недели проект совершил очередной виток развития. Вот основные новости проекта в кратком изложении:
  • Окончена реорганизация сетевого ядра. Сейчас оно тестируется и пишутся модули для расширения функциональности сети.
  • Полностью переписан модуль пользовательского интерфейса. Теперь все элементы взаимодействия с пользователем - это формы и виджеты.
  • Доработана система освещения. Теперь даже мягки тени не опускают FPS ниже плинтуса


четверг, 24 января 2008 г.

Сетевые трудности

Проектирование сетевого ядра изначально преподнесло несколько интересных сюрпризов. Простейший клиент, который передавал и принимал данные через сервер, был написан достаточно быстро. Но в связи с тем, что он дописывался в местах, где не было интернета, а графическое ядро претерпевало глобальные изменения, которые не позволяли объединить ядра, весь код отлаживался только на localhost’е. Как только я начал объединять ядра, точнее встраивать своё ядро в зачатки движка игры начались проблемы. В этом посте я опишу две, на мой взгляд, интересные проблемы сервера, с которыми я столкнулся при тестировании сетевого ядра игры на разных компьютерах, и их решения.
Первой проблемой, с которой я столкнулся, это было нежелание ядра работать на других компьютерах, точнее, как позже выяснилось, на компьютерах с операционной системой Windows Vista. Система отказывалась принимать некоторые пакеты. Оказалось, что переписав сетевой стек, компания Microsoft исключила поддержку флагов сообщений, поэтому сообщения, которые были помечены флагом, не принимались (во всяком случае, тем методом, который я использовал). Однако, к чести Microsoft, на MSDN уже давно висела пометка, что использовать флаги не желательно. После того как флаги были убраны из сообщений клиент заработал и разницы в сетевых ядрах XP и Vista я больше не замечал.
Вторая проблема заключалась в алгоритме обработки различий скоростей подключения клиентов. Изначально сервер был простым ретранслятором информации, то есть клиент присылал ему сообщение о своих координатах, а сервер отправлял это сообщение каждому из подключившихся клиентов. Получалось так, что клиент А с хорошей скоростью соединения передавал данные в штатном режиме, а клиенту Б своего канала не хватало и принимать данные со скоростью клиента А он не мог (не говоря о том, что клиентов может быть несколько). Данные приходили с возрастающей задержкой и вскоре движения копии игрока отставали от оригинала больше чем на минуту. После этого, стало ясно, что серверу необходимо масштабировать информационные потоки в зависимости от пропускной способности канала клиента. Были опробованы различные варианты, но все они имели недостатки. Однако, в сессию мне пришло довольно интересное решение, которое я на днях воплотил в коде сервера. Оно основано на том, что у класса на сервере, который работает с подключённым клиентом, создаётся собственное хранилище, где он хранит все данные о состоянии других клиентов-игроков. Я делю все данные, которые появляются в процессе игры, на два потока: поток синхронизации и поток команд. Состояние клиента описывается одним синхронизационным пакетом, поэтому если такой пакет игрока А ещё не был отправлен игроку Б, но уже пришёл новый пакет от игрока А, то игроку Б необходимо отправлять только новый пакет. Также необходимо поддерживать равную плотность синхронизационных пакетов от игроков-противников, независимо от скорости подключения этих игроков (за некоторый промежуток времени от всех игроков должно поступить равное количество пакетов, если они предоставляли их с требуемой скоростью). Совершенно другая ситуация складывается у командных пакетов. Они должны быть доставлены в максимально короткие сроки в полном объёме. Итак, у каждого игрока-клиента на сервере есть хранилище, в котором описаны состояния всех игроков. Также в хранилище есть очередь команд. Игроку-клиенту с сервера по очереди посылаются состояния других игроков, находящиеся в хранилище класса-представителя игрока, если данные изменились с момента прошлой посылки. Между посылками состояний игроков проверяется командный буфер и если в нём есть данные, то они отправляются вне очереди. Когда же информация о состоянии игрока приходит на сервер, то она копируется в соответствующие игроку ячейки других классов-представителей, затирая информацию о прошлом состоянии игрока. Плюсы этой системы состоят в том, что на сервере хранятся только самые последние данные и система весьма быстра, по сравнению с известными мне аналогами. К минусам такой системы можно отнести больший объём затрачиваемой памяти, по сравнению с аналогами и необходимость правильной синхронизации между потоками клиентов.


среда, 16 января 2008 г.

О нас

  Итак, это будет нашим первым постом. В этом посте мне хотелось бы рассказать о проекте «Storm Reloaded», который мы развиваем и ради которого был создан этот блог. Проект представляет собой набор библиотек для создания собственных игр, так называемый «движок» игры. На данный момент движок состоит из графического, физического и сетевого ядер и имеет прототип редактора карт. Я (G@rd) веду разработку сетевого ядра и основной логики, Ch@ser пишет графику, редактор карт ведёт, недавно присоединившийся к нам, SurvivorPhantom, а в качестве физического ядра был взят PhysX от компании Ageia (www.ageia.com).
  Этот проект существует с сентября 2006 года, когда были написаны первые строки графического ядра. С тех пор проект разросся, в нашу группу вошли новые члены, многое было неоднократно переписано. Изначально и по сей день проект существует для самообразования, и мы не будем продавать наши релизы. Главной целью нашего проекта является самообучение и более глубокое понимание современных игровых технологий. Естественно втроём мы не можем догнать по всем параметрам таких лидеров игровой индустрии, как Epic Games, но некоторые возможности нашего движка идут наравне с аналогичными возможностями передовых компьютерных игр.
  Что касается блога, то в нём мы будем публиковать информацию о продвижении проекта, описание технологий, которые нас заинтересовали, своё мнение на некоторые компьютерные события и мероприятия. Мы надеемся, что информация, которую вы получите из нашего блога будет вам чем-то полезна и интересна.
Команда HardCodeCrew.