SoC and Hardware Software Co-design for Resource Constrained Embedded Systems

Graduate Students

  • Natt Thepayasuwan, Ph.D. candidate
  • Sankalp Kallakuri, Ph.D. candidate
  • Yulei Weng, Ph.D. candidate
  • Vaishali Damle, M.S. (graduated in 2003)
  • Rohit Pai, M.S. (graduated in 2003)
  • Research Goal and Objectives

     

    Motivation

    Many embedded systems must meet stringent cost, timing, and energy consumption constraints. In addition, embedded architectures are very thrifty in employing hardware resources:  they include general purpose processors running at low/medium frequencies (like ARM, 801C188EB, Philips 80C552 etc), have a reduced amount of memory (the memory capacity can be as low as 128k of RAM and 256k of flash memory), and incorporate customized co-processors and I/O peripherals (including RF and analog circuits). Typical examples include embedded systems for telecommunication and multimedia, like cell phones, digital cameras, and personal communicators. Systems-on-Chip (SoC) are single-chip implementations of embedded systems. Compared to printed circuit board designs, SoC offer higher performance and reliability at cheaper costs. It is foreseen that advances in device manufacturing technology, including present deep submicron technologies and future nanotechnologies, will continuously reduce the minimum feature size, and thus increase the functional complexity of SoCs,  while clock frequencies will range around 10-15 GHz.


    Figure 1: Impact of layout on data communication speed and system design

    For SoC realized in deep submicron technologies (DSM), physical level attributes, such as interconnect parasitics, substrate coupling, and substrate noise, significantly influence system performance,
    e.g. data communication speed, system latency, power consumption, and signal integrity. Figure 1 illustrates the impact of layout parasitics on data communication speed and system design. Each task is labeled by its execution time on Power PC processor core. Without considering layout information, the co-design step decides to allocate a single 266MHz system bus for all core communications. This would meet the timing constraints, while keeping the system architecture simple. However, considering the physical distances between cores - shown in Figure 1(b), it is difficult to implement a bus with the requested speed. The same latency can be obtained with three buses of lower speed, like those in Figure 1(b), because the system concurrency improves. The bus speeds of 133MHz, 133MHz, and 33 MHz were found based on the physical locations of cores, and the RLC parasitic of the routed buses. This example arguments that the communication sub-system of an SoC needs to be designed while contemplating layout-related criteria. In general, it is difficult to postulate a unique bus architecture as being optimal for various applications and performance requirements. Instead, bus architectures need to be customized depending on the application specifics and design needs.

    Related Publications

  • N. Thepayasuwan, A. Doboli, "Layout Conscious Approach and Bus Architecture Synthesis for Hardware-Software Co-Design of Systems on Chip Optimized for Speed", accepted for publication, IEEE Transactions on VLSI Systems, 2004.
  •  N. Thepayasuwan, A. Doboli, "Pruning-based Synthesis of Flat and Hierarchical Bus Architectures for SoC in Deep Submicron Technologies", International Journal on Embedded Computing, Vol. 1, 2004.
  • S. Kallakuri, A. Doboli, S. Doboli, "Applying Stochastic Modeling to Bus Arbitration for Network-on-Chip Systems", submitted, Integration the VLSI Journal, special issue on VLSI System-On-Chip, May 2004.
  • N. Thepayasuwan, A. Doboli, ``Layout Conscious Bus Architecture Synthesis for Deep Submicron Systems on Chip'', Design, Automation and Test in Europe Conference (DATE) 2004, Paris.
  • S. Kallakuri, A. Doboli, S. Doboli, "Stochastic Modeling Based Environment for Synthesis and Comparison of Bus Arbitration Policies", International Symposium on VLSI (ISVLSI), 2004.
  • N. Thepayasuwan, A. Doboli, "Hardware-Software Co-Design of Resource Constrained Systems on a Chip in Deep Submicron Technology", International Workshop on Embedded Computing Systems (ECS-04), Tokyo, 2004.
  • N. Thepayasuwan, A. Doboli, "OSIRIS: Automated Synthesis of Flat and Hierarchical Bus Architectures for Deep Submicron Systems-on-Chip", International Symposium on VLSI (ISVLSI), 2004.
  • N. Thepayasuwan, V. Damle, A. Doboli, ``Bus Architecture Synthesis for Hardware-Software Co-Design for Deep Submicron Systems on Chip'', International Conference on Computer Design (ICCD) 2003, San Jose CA.
  • N. Thepayasuwan, A. Doboli,"An Exploration Based Binding and Scheduling Technique for Synthesis of Digital Blocks for Mixed-Signal Applications", Proc. ISCAS 2003, Bangkok.
  • S. Kallakuri, A. Doboli, S. Doboli, ``Applying Stochastic Modeling to Bus Arbitration for Network-on-Chip Systems'', Proc. of the 2003 International Conference on VLSI, Las Vegas, 2003.
  • V. Damle, A. Doboli, ``Pattern-Based Pin-to-Pin Routing for High Speed Digital Circuits in Deep Submicron Technologies'', accepted for the Southwest Symposium on Mixed Signal Design (SSMSD), Las Vegas, 2003.
  • N. Thepayasuwan, A. Doboli, ``A Methodology for Core Placement and Bus Synthesis under Time, Area and Energy Consumption Constraints'', International Workshop on Logic and Synthesis, New Orleans, 2002.
  • A. Doboli, R. Vemuri,"Integrated High-Level Synthesis and Power-Net Routing for Digital Design under Switching Noise Constraints", Proceedings of the Design Automation Conference 2001, Las Vegas.
  • A. Doboli, "Integrated Hardware-Software Co-Synthesis and High-Level Synthesis for Design of Embedded Systems under Power and Latency Constraints", Proceedings of the Design, Automation and Test in Europe Conference, 2001, Munich.
  • P. Eles, A. Doboli, P. Pop, Z. Peng, "Scheduling with Buss Access Optimization for Distributed Embedded Systems",  IEEE Transactions on VLSI Systems, Vol. 8, No. 5, pp. 472-491, October 2000.
  • OSIRIS: Layout Conscious Approach and Bus Architecture Synthesis for Systems on Chip Optimized for Speed

    Approach

    This research focuses on hardware-software co-design method for developing SoC implementations subject to latency minimization. The novelty is in proposing a systematic, layout-conscious approach for tackling the SoC communication sub-system, including an original bus architecture synthesis algorithm. System-level design attempts to minimize latency and maximize the feasibility of
    constraints imposed to the bus architecture. Applications are task graphs with data dependencies and reduced number of control dependencies. The set of available hardware resources and the SoC area are known. The co-design method includes three subsequent parts: (1) combined partitioning and static non-preemptive scheduling, (2) bus architecture synthesis, and (3) re-scheduling for the best bus architecture. The first step is an exploration process based on simulated annealing algorithm. The cost function expresses the minimization of system latency and maximization of the feasibility of bus architecture constraints, like required speed, number of links and amount of resulting connectivity between cores. We propose Performance Models (PM), a graph-based description, that symbolically captures the relationships between performance, graph characteristics, and design decisions. PM are general, flexible, and can be easily extended to new design activities without requiring cumbersome validation. The second step synthesizes and routes the bus architecture for an SoC. IP cores are placed using a hierarchical cluster growth algorithm. Using the proposed PBS bitwise generation algorithm, bus architecture synthesis first identifies a set of possible building blocks, and then assembles them together, such that bus length, bus topology, communication
    conflicts, and unnecessary core connectivity are minimized. We propose a special table structure (named bus architecture synthesis table) and select-eliminate method to prune poor solutions, such as buses with complex and redundant connectivity. The algorithm was successfully used to automatically synthesize bus architectures for realistic SoC, including a network processor and a JPEG SoC.