The dataflow supercomputer outperforms the conventional multi-core supercomputers based on CPU/ GPUs in compute-intensive exascale High Performance Computing (HPC) applications by orders of magnitude in terms of computing and power performance [1]. The best performance has been reported by application-specific heterogeneous dataflow supercomputers built on commercial FPGAs with a speedup over 200× compared to a single-core computer [2]. As an HPC application, a 3D graphics application for massively complex models is in an urgent need of high-performance computing and low power consumption. In this paper, an innovative chipset-on-card design methodology for 3D supercomputing applications based on K-dimensional binary space partitioning (BSP) out-of-core ray-tracing algorithm [3] is described for achieving a performance higher than the reported dataflow supercomputers. This algorithm is reformulated as a set of parallel pipelines with minimal data exchange and partitioned into separate data flows. The entire data flow diagram is then mapped into a reconfigurable high-performance computing chipset-on-card.