This chapter describes a very fast median filter for today's GPUs, and
explains how to port it to future GPUs and other data-parallel
processors like DSPs and CPUs with vector instructions (e.g., MMX,
SIMD). The technique used in this chapter is inherently fast because
it is designed with ideal characteristics for streaming parallel
architectures:
- No branches
- Single-pass
- Data-parallel across pixels
- Data-parallel at each pixel
- High compute-to-memory ratio
On a GeForce 8800 or comparable GPU, this optimized filter can process
multiple 4096x4096 video sequences at over 100 fps, which is important
for real-time video processing. We give shaders for the 3x3 and 5x5
kernels for which our filter is appopriate and for a sample
higher-order real-time non-photorealistic filter built on several applications
median.