Abstract

The goal of this project is to build a state-of-the-art GPU raycaster for rendering 3D scenes in real-time. Using advance hardware and driver support I will optimize my raycaster for minimal latency by filling pixels in a display buffer just in advance of scanline reads to a head-mounted display.

Background

Head-mounted displays for Virtual Reality require extremely low latency to prevent simulation sickness, but also require extremely high resolution due to the wide field of view and close proximity to the eye. Since displays read from a front buffer one scanline at a time, rendering an entire frame in advance is wasteful and introduces additional latency. New hardware and software will soon enable developers to anticipate scanline reads from the front buffer, opening the possibility of reducing latency by rendering only a small region at a time - a single row or two, or perhaps even mere pixels - just ahead of the current read location.

Conventional 3D rendering via perspective transform will have difficulty taking advantage of this opportunity because whole objects must be projected and rasterized; there is an initial cost incurred in submitting all objects to be transformed, which is the same for a single row as for a full frame. While it is not yet clear that rasterization cannot be adapted for per-scanline rendering, raycasting offers an obvious alternative, since individual rays can be sampled at any time and image pixels filled in any order. Given scanline info, a 'beam-chasing' raytracer could sample primary visibility (and optionally secondary rays) for pixels just ahead of the current read point, chasing (or actually leading) the 'beam' of the read point envisioned as a ray cast out into the scene.

The Challenge

Real-time raytracing is an active area of research with many performance obstacles, but recent state-of-the-art techniques are promising. The main problems inhibiting use of raytracing for interactive applications like virtual reality are 1. Efficiently constructing an optimal acceleration structure for ray-polygon intersection tests. This must be performed every frame for dynamic objects, and it is not clear how best to map this problem to parallel hardware, although recent research by Karras and Aila demonstrates a promising solution. 2. Performing millions or more ray-scene intersections using this acceleration structure, with sufficient throughput to render multiple frames at interactive rates and (for beam chasing) with minimal latency per ray or small batch of rays. Many techniques have been proposed, and there is well-accepted if slightly dated analysis of best-practices performance, but research continues in new techniques and even new hardware customized to streamline raytracing workloads. Of primary concern is the fact that rays cast into a scene are often incoherent and take divergent paths in traversal of an acceleration structure. This causes divergent execution, making raytracing particularly difficult to schedule on wide-SIMD GPU's. Fortunately primary visibility rays are typically the most coherent, which bodes well for a fast implementation on current hardware.

The first challenge of this project must therefore be to implement a sufficiently fast raycaster for interactive applications, following state-of-the-art practices and supporting at least primary rays for visible surface detection in dynamic scenes. Implementing and optimizing these algorithms on the latest hardware offers interesting space for analysis, and with sufficient performance, opens the door for a beam-chasing adaptation of a full-frame raycaster. The second challenge of this project will be performing this adaptation and tuning the resulting implementation to achieve the closest possible synchronization to a front buffer being read out to a head-mounted display.

Resources and Platform

Since tracking front-buffer reads at scanline rates is not a standard feature of current graphics systems, I anticipate support from AMD through hardware and advance drivers for this project. As a result, I will be using HLSL compute shaders to implement a parallel raycaster on an AMD GPU using the Mantle API on Windows. This is necessary to support the beam-chasing modification, and makes use of the lowest-level API currently available on the most common platform for current virtual-reality applications (a PC with a high-end GPU rendering to a head-mounted-display such as the Oculus DK2).

Goals and Deliverables

The primary goal of this project is to implement a GPU-parallel raycaster that is sufficiently fast to render simple dynamic scenes to ~~an Oculus DK2~~ a high-refresh-rate display. This is the application that I intend to demonstrate at the parallelism competition; the main deliverable will be a clean code base (maintained in this repository) for a general-purpose raycaster ~~to use with VR displays~~. As a basic performance projection, Aila and Laine measured performance in excess of 40 Mrays/s, which is sufficient to render two 512x512 images at 75 Hz with one ray per pixel; with newer hardware I hope to exceed this value and obtain higher resolutions. The second goal of the project is to modify the raycaster to a beam-chasing implementation, and measure the minimal start-to-scanout latency that can be achieved in this fashion. I will also experiment with varying the sampling rate (rays per pixel) across a rendered image, and analyze the quality and performance of reducing sampling rate with increasing distance from a focal point in the image plane. This is a highly desirable optimization for future HMD's with eye-tracking capability that is also difficult to achieve with rasterization, and can be implemented without the beam-chasing component.

The main focus of the project is on the beam-chasing optimization, which is highly dependent on support from AMD. Although we are confident this support will be forthcoming, should beam-chasing not prove feasible I will continue to implement the raycaster with non-uniform-sampling and rendering to a DK2; an analysis of state-of-the-art techniques on the newest hardware and performance measurements of their applicability to VR rendering is a desirable alternative deliverable. If all goes well and time permitting, I will design a simple framework for defining and automatically launching rendering tasks in the context of a beam-chasing raycaster as an event-driven system.

Schedule

April 3 - 9: ~~Implement basic framework and begin BVH construction.~~ Researched state-of-the-art techniques and Mantle API
April 10 - 16: ~~Optimize BVH construction, implement raycasting.~~ Implemented scene handling and basic raycasting systems for D3D11 Project Checkpoint (04/16): ~~A working raycaster; will describe current state, additions needed, and path to integrating with Oculus and beam-chasing.~~ Complete: Scene handling, full screen ray generation, unit triangle intersections; In Progress: Raycasting to fill position buffer, basic shading from directional light
April 17 - 23: ~~Test rendering simple scenes, integrate with Oculus.~~ Complete BVH-accelerated raycasting in D3D11
April 24 - 30: ~~Measure performance on Oculus, implement beam-chasing if available.~~ Experiment with possible pipelines in Mantle, identify optimal strategies for sampling blocks of rows in sequence
May 1 - 7: Optimize raycasting and beam chasing and implement non-uniform sampling.
May 8 - 11: Prepare final presentation and report (last performance measurements).

PROJECT UPDATE 04/20/2015

One fact which quickly became apparent during initial research is that the Oculus SDK does not currently support Mantle rendering. Since I probably don't have time to do the integration myself, I'm changing Oculus support to a stretch goal for this project. If I have time I will investigate Mantle-Oculus rendering further in the final week, but I can better devote that time to creating more compelling demos on a desktop monitor (3840x2160 @ 60Hz or 1920x1080 @ 144Hz maximum). As of this writing, Mantle development is not yet available to me, but should be within the next few days. This has slowed down development, as I try to read ahead in the Mantle documentation while building against D3D11. By tomorrow I should have full-screen raytracing and simple shading in DirectX. I am now aiming to complete a BVH implementation by next week, so that I can fully devote the last two weeks to exploring the fastest possible implementations in Mantle. BVH construction, and raycasting through the BVH, is the last major but relatively straightforward programming prerequisite for this project, on which I am simply behind schedule due to other commitments. At this point, my main concern is moving from DirectX to Mantle; while much of the core code is portable, the new API presents technical challenges for reworking asset and rendering pipelines (easily handled), but also raises interesting questions about memory binding and compute dispatch synchronization, which at this point I cannot resolve without experimentation. I am currently focused on finishing the general coding prerequisites for accelerated raycasting and testing them in a simplistic DirectX renderer, so I can devote plenty of time to Mantle in the last two weeks of the project.

Final Report (End-of-Semester Update 05-11-2015)

Summary

I am implementing a state-of-the-art raycaster within the Mantle API for rendering to an HMD. While the project is not yet complete, I currently have an interactive raycaster with acceleration structures in-progress.

Background

My project aims to provide tools for raycasting as an alternative to rasterization for visibility testing in the graphics pipeline. Moreover, I intend to show that this is feasible in a modern renderer, and demonstrate how low-level GPU / display synchronization can be used to perform extremely low-latency rendering by raycasting separate rows ahead of display scanline reads. The major deliverables are a set of HLSL functions for fast raycasting, and a minimal demo application in Mantle, with analysis of beam-chasing capabilities. The inputs to my application are static scenes and acceleration structures, which can be generated from scripts included with the application; support for dynamic scenes and tools for fast bounding-volume-hierarchy construction are considerations for future work. The major raycasting components are implemented in HLSL, because Mantle accepts graphics and compute shader inputs generated from HLSL source, and this makes the code extremely portable. Mantle in turn will be used to access low-level synchronization tools previously unavailable. Thus, the goal is to parallelize ray-scene intersection for primary visibility testing, by tracing packets of rays in shader code.

Approach

As noted previously, Mantle is used in my project as an interface to the GPU in a rendering context. Mantle provides certain tools, such as querying the current scanline being read to the display, and others still in development, that may make a real beam-chasing raycaster possible for the first time. Although I have not yet explored these tools in detail, that is a major focus of the project as I will be continuing over the near future.

Within the Mantle API, my current implementation generates a mesh of ray sample positions in screen space. My HLSL raycasting functions are designed to be portable to any shader stage, so a simple implementation allows me to fill an image by rendering the ray mesh (using a Point topology for rasterization), and performing the raycast inside the vertex shader. While some minor optimizations may be possible only in a compute shader, using the graphics pipeline allows me to take advantage of the GPU scheduler to distribute rays in a natural packet size for the machine, and maintain good load-balancing at fine granularity of dispatches. Given a buffer of sample points, the scheduler automatically divides these into batches that match the SIMD width of the GPU, and distributes batches across available cores according to resource requirements. The actual shader code is scalar but implicitly parallelized; a while-while loop design with postponed leaf processing in my (in-progress) BVH traversal therefore equates to a SIMD-width group of rays, traversing a tree together. Note that this is not actual packet tracing (rays are not all required to visit the same nodes at the same time) but the traversal order of any one ray is affected by the other rays in its launch group. I assume the scheduler follows the typical behavior of assigning consecutive indices to vertices within a SIMD launch, so that threads will retrieve adjacent rays from the input buffer and thus be fairly coherent. For row-over-row beam chasing, I can then take advantage of guarantees that writes from one draw will complete before writes from a subsequent draw, although the second draw may begin executing earlier, or add extra synchronization between rows to force some spacing for latency as needed.

It remains to be scene whether a compute approach might yield better results for the particular case of beam-chasing visibility tests, but the vertex shader approach demonstrates the portability of my code, which can be used to create other raycasting effects. In particular, I plan to implement a foveal rendering technique by leveraging the hardware tessellator to automatically generate additional ray samples within screen tiles.

Results

So far, I do not have results for beam-chasing, since I have not yet been able to explore techniques for doing this. I encountered significant difficulties in developing basic raycast structures in Mantle, and opted late in the project schedule to re-write my implementation in a barebones framework that enabled easier debugging. This allowed me to complete my raycast and application framework in a simple, clean C code base, but came at the cost of time needed to develop good acceleration structures. My current version performs a linear iteration over triangles, and can handle several thousand triangles at a resolution of 800x600 rays while remaining interactive. I also have code for BVH loading and traversal, but am only performing basic tests for correctness until I can develop an offline process for generating optimal BVH's.

One aspect of the project that I will not be completing (or at least, not at all soon) is GPU-accelerated fast BVH construction. This is a requirement for raycasting dynamic scenes, which has very promising solutions in the literature, but proved to be outside my time window for the project. Additionally, the known solutions rely on use of the ballot intrinsic for SIMD execution contexts, which is not supported in HLSL.

As far as performance is concerned, I am fairly satisfied with my raycast to this point, although I have not deeply analyzed the latency of per-row raycasts. In raytracing this is known to be affected significantly by the quality of acceleration structures. As a simple metric, my raycast can intersect with about 2000 triangles per ray on an AMD R9 285 while retaining high framerates, within the range needed for low-latency beam chasing. After this point, framerate begins to decay significantly. An acceleration structure should help enormously with large scenes, and I expect to achieve interactive rates without too much additional optimization. One thing to note is that my current, vertex-shader implementation performs unnecessary work in rasterizing points and launching a pixel shader which simply passes a shaded color to the output merger. If performance at per-scanline levels becomes critical, I may opt for a compute shader implementation that combines output blending with raycasting, and avoids the rasterizer stage. However, a more complex approach will be needed to support foveal rendering, one of the secondary goals of this project and another driving factor in the choice of raycasting for VR rendering.

Reflections and Next Steps

Although this project is not complete, I have gained substantial knowledge of core Mantle operation and plausible raycasting implementations in HLSL. Support for BVH-accelerated raycasting is currently in development, and I plan to continue the project until I can test real approaches to beam-chasing in Mantle. The main limiting factor in development so far was lack of time, influenced largely by a poor development choice that led to a reboot of the project following frustrating (and extremely time-consuming) debugging experiences. My current framework is significantly simpler, allowing easy modification and debugging. Unfortunately, I did not have time to implement good acceleration structures, having put this off in the hope of completing a GPU implementation (see above). In retrospect, it would have been better to prepare quality BVH's in advance, or forego acceleration structures in favor of exploring beam-chasing on simpler scenes. With the time I have now, however, I plan to implement a split-BVH builder and begin more interesting tests on larger scenes.

There is still a lot of work to do, but the current system demonstrates good techniques for raycasting in Mantle, and should be sufficient to begin beam-chasing experiments within a week or two. I will continue to work on the project, pending support from AMD, and update this page in the near future with results and publishable code. Until then, I am maintaining my code in a separate, private repository, as per an NDA agreement with AMD.

References

The major reference for this project is the Nvidia papers on GPU raycasting linked above: https://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/ The Mantle Programming Guide and API Reference is also available online: www.amd.com/Documents/Mantle-Programming-Guide-and-API-Reference.pdf

Rayvon

Fast, parallel beam-chasing raycaster for virtual reality displays.

Abstract

Background

The Challenge

Resources and Platform

Goals and Deliverables

Schedule

PROJECT UPDATE 04/20/2015

Final Report (End-of-Semester Update 05-11-2015)

Summary

Background

Approach

Results

Reflections and Next Steps

References