DirectX ray tracing became available in early 2018. DX12 Ultimate brings new features that weren’t available in DXR1.0.
Nvidia
Variable rate shading (VRS) allows games to focus more detail on portions of scenes that need it and less on background (or rapidly moving) portions where the lack won’t be noticed.
Nvidia
Mesh shaders (and the amplification shader) reinvent the DirectX geometry pipeline, increasing parallelism and potentially allowing fewer round-trips from GPU to CPU.
Nvidia
Sampler Feedback allows for increased quality without increased processing time, by allowing devs to only load textures when needed.
Nvidia
Today, Microsoft is announcing a new version of its gaming and multimedia API platform, DirectX. The new version, DirectX 12 Ultimate, largely unifies Windows PCs with the upcoming Xbox Series X platform, offering the platform’s new precision rendering features to Windows gamers with supporting video cards.
Many of the new features have more to do with the software side of development than the hardware. The new DirectX 12 Ultimate API calls aren’t just enabling access to new hardware features, they’re offering deeper, lower-level, and potentially more efficient access to hardware features and resources that are already present.
For now, the new features are slated largely for Nvidia cards only, with “full support on GeForce RTX”—the presentation you’re seeing slides from actually came from Nvidia itself, not Microsoft. Meanwhile, AMD has announced that its upcoming slate of RDNA 2 GPUs will have “full support” for the DirectX 12 Ultimate API—but not any prior generations of AMD cards. (AMD takes the opportunity to remind gamers that the same RDNA 2 architecture is powering both Microsoft’s Xbox Series X and Sony’s PlayStation 5 consoles.)
Some of the new calls are reminiscent of work that AMD has done independently in Radeon drivers. For example, variable rate shading strikes us as similar to AMD’s Radeon Boost system, which dynamically lowers frame resolution during rapid panning. While these features certainly aren’t the same thing, they’re similar enough in concept that we know AMD has at least been thinking along similar lines.
This Nvidia-produced video offers visual examples of the new features added in DirectX 12 Ultimate.
DirectX ray tracing
Enlarge / Although this image was not created using DirectX real-time ray tracing, it’s a good example of what the technique can do. The key concept is to follow a ray of light as it bounces from or refracts through multiple objects.
DirectX Ray Tracing, aka DXR, isn’t brand new—DXR1.0 was introduced two years ago. However, DirectX 12 Ultimate introduces several new features under a DXR1.1 versioning scheme. None of DXR1.1’s features require new hardware—existing ray-tracing-capable GPUs merely need driver support in order to enable them.
For the moment, only Nvidia offers customer-facing PC graphics cards with hardware ray tracing. However, the Xbox Series X offers ray tracing on its custom Radeon GPU hardware—and AMD CEO Lisa Su said to expect discrete Radeon graphics cards with ray tracing support “as we go through 2020” at CES2020 in January.
Inline ray tracing
Inline ray tracing is an alternate API that allows developers lower-level access to the ray tracing pipeline than DXR1.0’s dynamic-shader based ray tracing. Rather than replacing dynamic-shader ray tracing, inline ray tracing is present as an alternative model, which can enable developers to make inexpensive ray tracing calls that don’t carry the full weight of a dynamic shader call. Examples include constrained shadow calculation, queries from shaders that don’t support dynamic-shader rays, or simple recursive rays.
There’s no simple answer as to when inline ray tracing is more appropriate than dynamic; developers will need to experiment to figure out the best balance between use of both toolsets.
DispatchRays() via ExecuteIndirect()
Shaders running on the GPU can now generate a list of DispatchRays() calls, including their individual parameters. This can significantly reduce latency for scenarios that prepare and immediately spawn ray tracing work on the GPU, since it eliminates a round-trip to the CPU and back.
Growing state objects via AddToStateObject()
Under DXR1.0, if developers wanted to add a new shader to an existing ray tracing pipeline, they would need to instantiate an entirely new pipeline with an extra shader, copying the existing shaders to the new pipeline along with the new one. This required the system to parse and validate the existing shaders as well as the new one, when the new pipeline is instantiated.
AddToStateObject() eliminates this waste by doing just what it sounds like: allowing developers to expand an existing ray tracing pipeline in place, requiring parsing and validation of only the new shader. The efficiency bump here should be obvious: a 1,000-shader pipeline that needs to add a single new shader now only needs to validate one shader, rather than 1,001.
GeometryIndex() in ray tracing shaders
GeometryIndex() allows shaders to distinguish geometries within bottom-level acceleration structures, without needing to change data in the shader records for each geometry. In other words, all the geometries in a bottom-level acceleration structure can now share the same shader record. When needed, shaders can use GeometryIndex() to index into the app’s own data structures.
Skipping primitive instantiation with config tweaks
Developers can optimize ray tracing pipelines by skipping unnecessary primitives. For example, DXR1.0 offers RAY_FLAG_SKIP_CLOSEST_HIT_SHADER, RAY_FLAG_CULL_NON_OPAQUE, and RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH.
DXR1.1 adds additional options for RAY_FLAG_SKIP_TRIANGLES and RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES.
Variable rate shading
Enlarge / One side of this image renders 14-percent faster due to the use of VRS to avoid unnecessary detail rendering. Can you tell which is which? Nvidia
Variable rate shading (VRS) bills itself as “a scalpel in the world of sledgehammers.” VRS allows developers to select the shading rate on portions of frames independently, focusing the majority of the detail—and the rendering workload—on the portions that actually need it and leaving background or otherwise visually unimportant elements to render more rapidly.
There are two hardware tiers for VRS support. Tier 1 hardware can implement per-draw shading rates, which would allow developers to draw large, far away, or obscured assets with lower shading detail, then draw detailed assets with higher shading detail.
If you know that a first-person shooter gamer will be paying more attention to their crosshairs than anywhere else, you can have maximum shading detail in that area, falling gradually off to lowest shading detail in the peripheral vision.
A real-time strategy or roleplaying game developer, on the other hand, might instead choose to focus extra shading detail on edge boundaries, where aliasing artifacts are more likely to be visually obnoxious.
Per-primitive VRS takes things a step further by allowing developers to specify shading rate on a per-triangle basis. One obvious use case is for games with motion blur effects—why bother rendering detailed shadows on faraway objects if you know you’re going to blur them anyway?
Screenspace and per-primitive variable rate shading can be mixed and matched within the same scene, using VRS combiners.
Mesh and amplification shaders
Mesh shaders allow for greater parallelization of the shading pipeline.
Nvidia
An amplification shader is, essentially, a collection of mesh shaders with shared access to the parent object’s data.
Nvidia
Mesh shaders parallelize mesh processing using a compute programming model. Chunks of the overall mesh are separated into “meshlets,” each consisting typically of 200 or fewer vertices. The individual meshlets can then be processed simultaneously rather than sequentially.
Mesh shaders dispatch a set of threadgroups, each of which processes a different meshlet. Each threadgroup can access groupshared memory but can output vertices and primitives that don’t need to correlate with a specific thread in the group.
This greatly reduces rendering latency, particularly for geometries with linear bottlenecks. It also allows developers much more granular control over separate pieces of the overall mesh rather than needing to treat the entire geometry as a whole.
Amplification shaders are essentially collections of mesh shaders managed and instantiated as one. An amplification shader dispatches threadgroups of mesh shaders, each of which has access to the amplification shader’s data.
Sampler feedback
Enlarge / Separating the shading and rasterization allows for more efficient and infrequent runs through lighting computation routines.Nvidia
Sampler feedback essentially makes it simpler for developers to figure out in what level of detail to render textures on the fly. With this feature, a shader can query what part of a texture would be needed to satisfy a sampling request without actually having to perform the sample operation. This allows games to render larger, more detailed textures while using less video memory.
Texture Spacing Shading expands on the sampler feedback technique by allowing the game to apply shading effects to a texture, independent of the object the texture is wrapped around. For example, a cube with only three faces visible doesn’t need lighting effects applied to the back three faces.
Using TSS, lighting effects can be applied to only the visible portions of the texture. In our cube example, this might mean only lighting the portion wrapping the three visible faces in compute space. This can be done prior to and independent from rasterization, reducing aliasing and minimizing the computation expense of the lighting effects.
Listing image by Nvidia
RSS