33 Milliseconds - Public With Notes

1
Digital Foundry: Space Marine is remarkable in that in our tests we saw a locked 30FPS with v-sync throughout the en>rety of the in-game ac>on [] The locked frame-rate remains no maDer which console you're playing on: both 360 and PS3 are remarkably solid throughout. Space Marine's sheer consistency is a great asset: controller response feels good and there's no lag regardless of how much is happening on-screen. Good games are not made by caring about bugs, memory, performance and so on at the last minute. Keep your framerate _always_ during the _whole_ producAon. When you nish the space, opAmize, as you go... We were lucky, we had an amazing opAmizaAon team, and we did a LOT of overAme. If you consider overAme normal, please change industry J
We didnt allow the GPU to run one frame behind of the CPU. This was tricky not only because the CPU rendering has to be fast enough and generate enough work to keep the GPU from stalling, but because we also had dependencies from the GPU to the CPU, reading back the occlusion counters and waiAng on fences to update dynamic streams and textures.
Rendering Phase 2 emits a command buer but also waits on the GPU for HW occlusion queries to be used in the Texture Combiner stage. There is no stall as typically LighAng and SSAO take enough Ame to cover the GPU latency. Timing is fundamental, not only between serial parts and spawned CPU tasks, but also between CPU and GPU.
10
GPU Z-Buer reprojecAon does not work well with moving or skinned objects, either we have to ignore these in the reprojecAon (marking them in the stencil) and leaving holes or we have to reproject everything potenAally leading to false occlusions. Crysis 2 does the laeer, using a median lter to ll small holes afer the scaeering. Coupling Z-Buer reprojecAon with simple occlusion meshes would be the best soluAon. AutomaAc generaAon of LODs is simpler than the automaAc generaAon of occluders, as the laeer have to be inscribed inside meshes, and these are ofen open or present complex degenerate cases in games. Standard vertex LODs are not so important for us also because our current engine is very heavy on material costs (GPU context switching, more CPU objects, resources in ight). LODs were needed to collapse materials and reduce the number of CPU objects in ight in a frame, more than reducing vertex counts. Two interesAng approaches for occluders are: - SimplicaAon Envelopes (guaranteed maximum deviaAon from original mesh) by Cohen-Varshney-Manocha-Turk-Weber-Agarwal-Brooks-Wright - Voxel Mesh Processing. Voxel rasterizaAon can deal with holes (by scanning using
11
12
13
Do you want hot-swappable assets? Streamable assets? Dont use reference counters with indirecAons (handles). Instead, keep a list of things that you need to patch if the asset swaps and use direct pointers. Or if you want handles, do handles to arrays of resources, not single ones
14
We use the zbuer ofen to convert to view and world space. We convert the depth to a linear R32F, it seems to be worth the cost. On PC we use MRT to render to the R32F while doing the Gbuer pass (as we cant easily read the hardware Z). We use hardware Z readback on PC only for PCF shadows (as that is way more supported across our hw range than the raw depth reads).
15
16
17
18
Fun fact: We didnt include sky in the gbuer pass, so the sky region would contain random depths on PC (depth wrieen with a MRT r32f buer, which is not cleared every frame the actual Z is). That generated a high variance in the kernel sizes for AO in the area which trashed the cache J
19
The only plamorm on which the single shadow buer does not work too well is Intel Sandy Bridge, which for now hosts only a single early-z rejecAon buer, thus switching depth buers conAnously invalidates it all the Ame. Its fundamental in a deferred renderer to preserve the early-z rejecAon of the main scene depth across the pipeline. The culling system is similar to the one we have for the main view, but the occlusion geometry was authored too loosely and would discard important casters in some situaAons. As we couldnt x that art problem at the Ame, we did employ only single occluder planes instead of relying on the sofware occlusion rasterizer.
20
The half-frame rate shadows are implemented in Crysis2. Crytek conrmed that they disable them in situaAons where the trick wont work. We experimented with splanng back the moving objects every frame. Resolve and memory costs didnt make this worth for us. NoAce that stable cascades can be seen as a window over a big orthographic projects of the whole scene. We just shif this window around frame to frame. If everything is staAc, we could see the previous frame data as a cache, we could just render the new small border that results from the intersecAon of the previous frame window with the current one. The problem there is sAll with dynamic objects and with the fact that at each frame we nd a new near-far z-ranges. The z-range problem can be solved by reprojecAng (but that would lose some resoluAons) or by wriAng an index in a buer (stencil?) which corresponds to an array that remembers which near-far was used for that pixel, thus allowing correct reprojecAon when compuAng the shadows. Note that all this also means that using the previous frame depth for occlusion culling of the next one is going to work well, because other than moving objects and the border where we dont have data, the rest will be exact, no perspecAve distorAon and no scaeering is needed.
21
The shadowmap size was choosen for performance and to t into 360 EDRAM Note: Best-cascade selecAon is possible too: draw on screen a volume that corresponds to the frustum of a given cascade minus the geometry of the subsequent one
22
23
24
Next Ame: Tiled deferred lighAng. Tile classicaAon can operate both on light volumes and on materials
25
See Rendering Tech of Space Marine (KGC 2011) for the details of the Oren Nayar approximaAon!
26
We have normal-only decals and colour-only decals too. Most are both though and get rendered in both passes.
27
28
29
30
We use the zbuer ofen to convert to view and world space. We convert the depth to a linear R32F, it seems to be worth the cost. On PC we use MRT to render to the R32F while doing the Gbuer pass (as we cant easily read the hardware Z). We use hardware Z readback on PC only for PCF shadows (as that is way more supported across our hw range than the raw depth reads).
31
32
33
34
35
36
37
38

33 Milliseconds - Public With Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

33 Milliseconds - Public With Notes

Uploaded by

Copyright:

Available Formats

1

You might also like