Design-Space Exploration of Application-specific Instruction-set Processor Design

Authors

  • M. H. Sargolzaei

DOI:

https://doi.org/10.47839/ijc.20.4.2439

Keywords:

computer architecture, high-performance computing, energy efficient processor design, coarse-grained reconfigurable array, application-specific instruction-set processor

Abstract

Application-Specific Instruction-Set Processors (ASIPs) have established their processing power in the embedded systems. Since energy efficiency is one of the most important challenges in this area, coarse-grained reconfigurable arrays (CGRAs) have been used in many different domains. The exclusive program execution model of the CGRAs is the key to their energy efficiency but it has some major costs. The context-switching network (CSN) is responsible for handling this unique program execution model and is also one of the most energy-hungry parts of the CGRAs. In this paper, we have proposed a new method to predict important architectural parameters of the CSN of a CGRA, such as the size of the processing elements (PEs), the topology of the CSN, and the number of configuration registers in each PE. The proposed method is based on the high-level code of the input application, and it is used to prune the design space and increase the energy efficiency of the CGRA. Based on our results, not only the size of the design space of the CSN of the CGRA is reduced to 10%, but also its performance and energy efficiency are increased by about 13% and 73%, respectively. The predicted architecture by the proposed method is over 97% closer to the best architecture of the exhaustive searching for the design space.

References

R. Tessier, K. Pocek and A. DeHon, “Reconfigurable computing architectures,” Proceedings of the IEEE, vol. 103, issue 3, pp. 332-354, 2015. https://doi.org/10.1109/JPROC.2014.2386883.

H. Lee, M. S. Moghaddam, D. Suh, and B. Egger, “Improving energy efficiency of coarse-grain reconfigurable arrays through modulo schedule compression/ decompression,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 15, issue 1, pp. 1-26, 2018. https://doi.org/10.1145/3162018.

A. Palagin and V. Opanasenko, “The implementation of extended arithmetic’s on FPGA-based structures,” Proceedings of the 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, vol. 2, (IDAACS’2017), 21-23 September 2017, Bucharest, Romania, pp. 1014–1019. https://doi.org/10.1109/IDAACS.2017.8095239.

V. Opanasenko, A. Palahin, and S. Zavyalov, “The FPGA-based problem-oriented on-board processor,” Proceedings of the 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, vol. 1, (IDAACS’2019), 18-21 September 2019, Metz, France, pp. 152–157. https://doi.org/10.1109/IDAACS.2019.8924360.

J. Choi, S. Kim and H. Han, “Accelerating loops for coarse grained reconfigurable architectures using instruction extensions,” ACM Symposium on Research in Applied Computation, New York, USA, 21-24 Mar., 2011, pp. 314-318. https://doi.org/10.1145/2103380.2103445.

H. K. Nguyen, T. V. Le-Van and X. T. Tran, “A survey on reconfigurable system-on-chips,” REV Journal on Electronics and Communications, vol. 7, pp. 3-4, 2018. https://doi.org/10.21553/rev-jec.147.

J. F. Eusse, C. Williams and R. Leupers, “CoEx: A novel profiling-based algorithm/architecture co-exploration for ASIP design,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 8, issue 3, pp. 1-16, 2015. https://doi.org/10.1145/2629563.

K. Balasubadra, A. P. Shanthi and V. P. Srinivasan, “Hybrid design space exploration methodology for application specific system design,” International Journal of New Computer Architectures and Their Applications, vol. 7, issue 3, pp. 102-112, 2017. https://doi.org/10.17781/P002363.

J. Zhang, H. Tabkhi and G. Schirner, “DS-DSE: Domain-specific design space exploration for streaming applications,” Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19-23 March, 2018, pp. 165-170. https://doi.org/10.23919/DATE.2018.8341997.

Three-address code, [Online]. Available at: https://en.wikipedia.org/wiki/Three-address_code.

Livermore Loops Benchmark, [Online]. Available at: http://www.netlib.org/benchmark/livermorec.

joshuakehn. Sorting Algorithms, [Online]. Available at: http://www.joshuakehn.com/2010/10/1/Sorting-Algorithms.html.

Kernel (Image Processing), [Online]. Available at: https://en.wikipedia.org/ wiki/Kernel_(image_processing).

BDTI DSP Kernel Benchmarks, [Online]. Available at: https://www.bdti.com/ Services/Benchmarks/DKB.

H. Park, Y. Park and S. Mahlke, “Reducing control power in cgras with token flow,” Proceedings of the Workshop on Optimizations for DSP and Embedded Systems, Seattle, USA, 22-25 Mar., 2009.

A. Lambrechts, P. Raghavan, M. Jayapala, B. Mei, F. Catthoor and D. Verkest, “Interconnect exploration for energy versus performance tradeoffs for coarse grained reconfigurable architectures,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, issue 1, pp. 151-155, 2008. https://doi.org/10.1109/TVLSI.2008.2002993.

S. Das, K. J. Martin, D. Rossi, P. Coussy and L. Benini, “An energy-efficient integrated programmable array accelerator and compilation flow for near-sensor ultralow power processing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, issue 6, pp. 1095-1108, 2018. https://doi.org/10.1109/TCAD.2018.2834397.

S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe and R. R. Taylor, “PipeRench: A reconfigurable architecture and compiler,” Computer, vol. 33, issue 4, pp. 70-77, 2000. https://doi.org/10.1109/2.839324.

B. Shehan, R. Jahr, S. Uhrig and T. Ungerer, “Reconfigurable grid alu processor: Optimization and design space exploration,” Proceedings of the 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, Lille, France, 1-3 Sep., 2010, pp. 71-79. https://doi.org/10.1109/DSD.2010.28.

Y. Wang, L. Liu, S. Yin, M. Zhu, P. Cao, J. Yang and S. Wei, “On-chip memory hierarchy in one coarse-grained reconfigurable architecture to compress memory space and to reduce reconfiguration time and data-reference time,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, issue 5, pp. 983-994, 2013. https://doi.org/10.1109/TVLSI.2013.2263155.

M. A. Tajammul, M. A. Shami, A. Hemani and S. Moorthi, “NoC based distributed partitionable memory system for a coarse grain reconfigurable architecture,” Proceedings of the 24th Internatioal Conference on VLSI Design, Chennai, India, 2-7 Jan., 2011, pp. 232-237. https://doi.org/10.1109/VLSID.2011.45.

T. Kojima and H. Amano, “A Fine-grained multicasting of configuration data for coarse-grained reconfigurable architectures,” IEICE Transactions on Information and Systems, vol. 102, issue 7, pp. 1247-1256, 2019. https://doi.org/10.1587/transinf.2018EDP7336.

M. K. Chung, Y. G. Cho and S. Ryu, “Efficient code compression for coarse grained reconfigurable architectures,” Proceedings of the IEEE 30th International Conference on Computer Design (ICCD), Montreal, Canada, 30 Sep.-3 Oct., 2012, pp. 488-489. https://doi.org/10.1109/ICCD.2012.6378687.

B. Liu, W. Y. Zhu, Y. Liu and P. Cao, “A configuration compression approach for coarse-grain reconfigurable architecture for radar signal processing,” Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Shanghai, China, 13-15 Oct., 2014, pp. 448-453. https://doi.org/10.1109/CyberC.2014.83.

B. Egger, H. Lee, D. Kang, M.S. Moghaddam, Y. Cho, Y. Lee, S. Kim, S. Ha and K. Choi, “A space-and energy-efficient code compression/ decompression technique for coarse-grained reconfigurable architectures,” Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Texas, USA, 4-8 Feb., 2017, pp. 197-209. https://doi.org/10.1109/CGO.2017.7863740.

S. A. Chin, N. Sakamoto, A. Rui, J. Zhao, J. H. Kim, Y. Hara-Azumi and J. Anderson, “CGRA-ME: A unified framework for CGRA modelling and exploration,” Proceedings of the 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Washington, USA, 10-12 July, 2017, pp. 184-189. https://doi.org/10.1109/ASAP.2017.7995277.

M. H. Sargolzaei and S. Mohammadi, “Energy efficient configuration unification and compression for CGRAs,” Microprocessors and Microsystems, vol. 62, pp. 1-11, 2018. https://doi.org/10.1016/j.micpro.2018.06.010.

M. K. Chung, J. K. Kim, Y. G. Cho and S. Ryu, “Adaptive compression for instruction code of coarse grained reconfigurable architectures,” Proceedings of the International Conference on Field-Programmable Technology (FPT), Kyoto, Japan, 9-11 Dec., 2013, pp. 394-397. https://doi.org/10.1109/FPT.2013.6718396.

Y. Kim and R. N. Mahapatra, “Dynamic context compression for low-power coarse-grained reconfigurable architecture,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, issue 1, pp. 15-28, 2009. https://doi.org/10.1109/TVLSI.2008.2006846.

Y. Kim, “Power-efficient configuration cache structure for coarse-grained reconfigurable architecture,” Journal of Circuits, Systems and Computers, vol. 22, issue 3, 1350001, 2013. https://doi.org/10.1142/S0218126613500011.

Downloads

Published

2021-12-31

How to Cite

Sargolzaei, M. H. (2021). Design-Space Exploration of Application-specific Instruction-set Processor Design. International Journal of Computing, 20(4), 519-527. https://doi.org/10.47839/ijc.20.4.2439

Issue

Section

Articles