TY - GEN
T1 - Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider
AU - Farmahini-Farahani, Amin
AU - Gregerson, Anthony
AU - Schulte, Michael
AU - Compton, Katherine
PY - 2011
Y1 - 2011
N2 - This paper presents efficient techniques for designing high-throughput, low-latency sorting units for FPGA implementation. Our sorting units use modular design techniques that hierarchically construct large sorting units from smaller building blocks. They are optimized for situations in which only the M largest numbers from N inputs are needed; this situation commonly occurs in high-energy physics experiments and other forms of digital signal processing. Based on these techniques, we design parameterized, pipelined sorting units. A detailed analysis indicates that their resource requirements scale linearly with the number of inputs, latencies scale logarithmically with the number of inputs, and frequencies remain fairly constant. Synthesis results indicate that a single pipelined 256-to-4 sorting unit with 19 stages can perform 200 million sorts per second with a latency of about 95 ns per sort on a Virtex-5 FPGA.
AB - This paper presents efficient techniques for designing high-throughput, low-latency sorting units for FPGA implementation. Our sorting units use modular design techniques that hierarchically construct large sorting units from smaller building blocks. They are optimized for situations in which only the M largest numbers from N inputs are needed; this situation commonly occurs in high-energy physics experiments and other forms of digital signal processing. Based on these techniques, we design parameterized, pipelined sorting units. A detailed analysis indicates that their resource requirements scale linearly with the number of inputs, latencies scale logarithmically with the number of inputs, and frequencies remain fairly constant. Synthesis results indicate that a single pipelined 256-to-4 sorting unit with 19 stages can perform 200 million sorts per second with a latency of about 95 ns per sort on a Virtex-5 FPGA.
UR - http://www.scopus.com/inward/record.url?scp=79961195741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79961195741&partnerID=8YFLogxK
U2 - 10.1109/SASP.2011.5941075
DO - 10.1109/SASP.2011.5941075
M3 - Conference contribution
AN - SCOPUS:79961195741
SN - 9781457712111
T3 - Proceedings of the 2011 IEEE 9th Symposium on Application Specific Processors, SASP 2011
SP - 38
EP - 45
BT - Proceedings of the 2011 IEEE 9th Symposium on Application Specific Processors, SASP 2011
T2 - 2011 IEEE 9th Symposium on Application Specific Processors, SASP 2011
Y2 - 5 June 2011 through 6 June 2011
ER -