Hardware and Software Acceleration of Physics for Virtual World Applications

The complexity and the demand for realism of future interactive entertainment applications, such as computer games, is mind-boggling. The real-time performance required by such complex applications, cannot be achieved without the synergy of hardware and software techniques. Prof. Reinman and I have been collaborating on a forward-thinking  project that aims to develop efficient multi-core architectures for interactive entertainment applications. Our recent work on characterizing and accelerating the computational load of physical simulation of such applications has resulted in major publications both in micro-architecture and graphics. Our long term goals is to produce a multi-core, possibly heterogeneous, architecture and associated algorithmic techniques that significantly accelerate the computational load of interactive entertainment applications.

Selected publications and demos:

  1. 1.Fool Me Twice: Exploring and Exploiting Error Tolerance in Physics-Based Animation”, Tom Yeh, Sanjay Patel, Petros Faloutsos, Glenn Reinman, ACM Transactions on Graphics,(invited for presentation at ACM SIGGRAPH 2010), Volume 29, Issue 1, pp. 1-11, December, 2009.


The error tolerance of human perception offers a range of opportunities to
trade numerical accuracy for performance in physics-based simulation. However, most previous approaches either focus exclusively on understanding the tolerance of the human visual system or burden the application developer with case-specific implementations. In this paper, based on a detailed set of perceptual metrics, we propose a methodology to identify the maximum error tolerance of physics simulation. Then, we apply this methodology in the evaluation of two techniques. The first is the hardware optimization technique of precision reduction which reduces the size of floating point units (FPUs), allowing more of them to occupy the same silicon area. The increased number of FPUs can significantly improve the performance of future physics accelerators. A key benefit of our approach is that it is transparent to the application developer. The second is the software optimization of choosing the largest time step for simulation.

Video demo:

  1. 2.ParallAX: An Architecture for Real-Time Physics”, Tom Yeh, Petros Faloutsos, Sanjay Patel, Glenn Reinman, in 34th Annual International Symposium on Computer Architecture, pp. 232-243, June 2007


We propose and characterize a set of forward-looking benchmarks to represent future physics load and explore the design space of future physics processors. In response to the demand of this workload, we demonstrate an architecture with a set of powerful cores and caches to provide performance for the serial and coarse-grain parallel components of physics simulation, along with a flexible set of simple cores to exploit fine-grain parallelism. Our architecture combines intelligent, application-aware L2 management with dynamic coupling/allocation of simple cores to complex cores. Furthermore, we perform sensitivity analysis on interconnect alternatives to determine how tightly to couple these cores.

  1. 3.The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration”, Tom Yeh, Petros Faloutsos, Sanjay Patel, Milos Ercegovac, and Glenn Reinman, The 40th Annual International Symposium on Microarchitecture (MICRO), Dec 2007, pp. 394-406.


Physics-based animation has enormous potential to improve the realism of interactive entertainment through dynamic, immersive content creation. Despite the massively parallel nature of physics simulation, fully exploiting this parallelism to reach interactive frame rates will require significant area to place the large number of cores. Fortunately, interactive entertainment requires believability rather than accuracy. Recent work shows that real-time physics has a remarkable tolerance for reduced precision of the significand in floating-point (FP) operations. In this paper, we describe an architecture with a hierarchical floating-point unit (FPU) that leverages dynamic precision reduction to enable efficient FPU sharing among multiple cores. This sharing reduces the area required by these cores, thereby allowing more cores to be packed into a given area and exploiting more parallelism.