Handbook of algorithms for physical design automation part 31.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 282 Finals Page 282 24-9-2008 #7 Handbook of Algorithms for Physical Design Automation the design unroutable. A strategy of simple uniform spreading of the design may work well for dense designs, but can unnecessarily hurt timing for sparse designs. Thus, white space management is absolutely required in modern placement algorithms to achieve better timing and routability. 3. Mixed-size placement: For the past few decades, standard cell placement is considered as the norm. A standard cell based design consists of circuit elements called standard cells whose heights remain the same, and the placement problem is to place these circuit elements within regular circuit row boundaries (Figure 14.5a). In today’s design methodology, designers are encouraged to take hierarchical or semihierarchical design approaches with reusable internal/third-party IPs to reduce design turnaround time. As a result, a wider distribution of circuit element sizes is observed during placement. In some sense, mixed-size placement (Figure 14.5b), as opposed to uniform-height placement of standard cells, is a more complicated problem because these large macro blocks can cause a serious challenge during placement legalization. Also, these chunky blocks play an important role in determining final timing performance. Hence, the early placement of large macros (during floorplanning or flat placement) is an important problem in a modern timing closure flow. To provide further flexibility in floorplanning and placement, sometimes all the standard cells as well as large macros are considered as movable objects and are placed simultaneously. This new problem is called floorplacement [8]. In general, today’s placement algorithms must be able to handle a wide range of object sizes, because the trend indicates that more IP blocks are included in a design. 4. Region constraints (movebounds): In a hierarchical design methodology, the functionality of a design is logically partitioned first and floorplanning is executed on the set of logically partitioned blocks to determine the approximate locations of those logical blocks. Circuit elements belonging to the same logical partition are grouped together and need to be placed in the vicinity of the layout region. This is in contrast to the top-down flat physical design process where logical partitioning is flattened and circuit elements are freely placed and routed at the leaf level of the logical hierarchy. In flat physical synthesis, physical partitions do not necessarily correlate with logical partitions. Although flat physical synthesis usually produces a better quality of result, the tight turnaround time requirement makes hierarchical synthesis a more viable solution in today’s environment. In a hierarchical synthesis flow, (a) (b) FIGURE 14.5 Standard cell placement versus mixed-size placement. (a) Standard cell placement and (b) mixed-size placement. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Placement: Introduction/Problem Formulation Finals Page 283 24-9-2008 #8 283 a region constraint (also known as a movebound) is usually employed to force circuit elements belonging to the same logic partition to be placed within a predetermined layout area. Essentially, a region constraint is a predetermined boundary where a set of circuit elements has to be placed and routed. Multiple region constraints may exist in a design, but not all the circuit elements are constrained by a region constraint either. Some circuit elements can be placed anywhere in layout area while others must be placed within the corresponding boundary defined by region constraints. Another motivation for region constraints is multiple clock/voltage domains in a design. In modern circuit design, multiple clock domains or voltage domains are frequently used due to performance and power dissipation trade-offs. For example, computationally unimportant parts in a chip can be slowed down to lower clock frequency while critical path computations have to be executed at the highest clock frequency. The lowered clock frequency results in the saving of unnecessary power consumption. The clock network typically consumes a significant portion of the overall chip power [9] and the size of the clock network can serve as a first-order approximation of the clock network power consumption. Hence, a smaller clock domain area is always preferred, if possible, to reduce clock network power dissipation. These clock domains are similar to logical partitions and region constraints can facilitate to define and reduce the size of clock domain. However, region constraints can make the placement algorithm complicated and modern placement algorithms must be capable of handling these unforeseen constraints without affecting the quality of results. 5. Clock-aware placement: Clock nets typically have much more sink pins to drive than normal data signal nets. This is because any sequential circuit elements (latches or flip-flops) require global clock signals to synchronize with each other. Sometimes, there are multiple clock domains in a design due to performance and power consumption trade-offs. Because of the high fan-out nature of clock nets, in a typical placement process, clock nets are ignored during the optimization as they tend to degrade the quality of placement solution. Moreover, the higher frequency design constraint forces a design to include more sequential circuit elements resulting in larger clock domain and higher clock power consumption. Recent studies show that the power budget of clock nets and networks amounts to more than 40 percent of overall chip power [9]. Even worse, each sink of a clock net needs to have the same signal propagation delay from the clock source. In reality, each sink has different clock signal propagation delay and the maximum delay difference of pairs of two sink pins is called clock skew. One of the objectives of clock network construction is to minimize the clock skew. Because clock nets are sensitive to technology variations due to their large network size and high frequency constraint, the placement of sequential circuit elements affects the clock network performance significantly. Postplacement clock network construction tends to fail more frequently due to the tighter skew, latency, and power constraints. Recent research [10] also shows that by considering these clock network constraints during quadratic placement, higher clock network performance (less skew, clock latency, and power consumption) can be achieved without almost any loss of data signal performance. Essentially, they tried to reduce clock network wirelengths by navigating potential locations of sequential circuit element during placement via register contraction techniques. More advanced technology and corresponding design paradigms will require higher clock frequency and more robustness to technology variations. Thus, the combined placement and clock network construction optimization has great promises for better layout solutions. 6. Scalability: As the complexity of a design grows exponentially, the corresponding number of gates in a design is expected to grow at a steep rate. Hierarchical design and the design reuse paradigm can help to manage the size of a design. Yet, a multimillion gate count is considered a norm in modern IC design [11,12]. Owing to large design sizes, placement runtime tends to be the bottleneck of the overall timing closure flow and the reduction of placement runtime, i.e., scalable placement algorithms, arises as a critical problem. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 284 Finals Page 284 24-9-2008 #9 Handbook of Algorithms for Physical Design Automation Recently, the multilevel paradigm shows great promise to make placement algorithms more scalable with design size [13]. The multilevel method consists of three processes: coarsened abstract structure generation (coarsening or aggregation), optimization on the coarsened structure (relaxation), and transforming optimized solutions to an uncoarsened structure (interpolation). The simplest method of applying multilevel optimization in placement is a so-called “V-cycle” [13] approach where consecutive coarsenings are executed to obtain the most reduced design, then iterative relaxation, interpolation, and uncoarsening are applied to convert it to an optimized, flattened level. Combined with netlist clustering algorithms, multilevel optimization was demonstrated to provide a significant runtime reduction without almost any degradation of (sometimes even improved) quality of solutions [12]. Chapter 19 addresses multilevel techniques and their application in placement. 7. Stability: To achieve timing closure and design convergence, typically several instances of placement have to be run. Placement can help to identify needed changes in the logic, required buffering, gate sizing, routing congestion, etc. Once these problems are fixed, placement may have to be run again. Ideally, after each subsequent placement run, the problems that were fixed during the last iteration stay fixed and new problems do not crop up. However, if a placement algorithm returns a dramatically different solution from the last solution, entirely new problems could emerge. In other words, a placement algorithm needs to produce similar solutions when almost the same input instance (albeit with a slightly different netlist or constraints) is given. This stability is a particularly important issue in a timing closure flow where multiple invocations of placement are necessary. The quantification of the degree of stability of placement algorithms is also an important topic for further research [14,15]. 8. Macroblock (random logic module) placement versus ASIC placement: In microprocessor design, blocks with tight timing constraints such as data-paths, floating point units, etc. are still custom designed and layouts are produced by a human being. However, significant portion of macroblocks, for example, control logic modules, consist of random logic. These random logic modules are typically designed in HDL (high-level description language), synthesized and laid out via design automation tools. The characteristics of these random logic modules are quite different from those of ASIC designs. Owing to the high-performance nature of microprocessor design, the target timing constraint is extremely tight compared with that of ASIC design. The latches and leaf level clock distribution buffers are much larger than standard cells and the locations of these objects tend to affect final timing performance considerably. The number of circuit elements to be placed is order of magnitude smaller than ASIC placement problems. Thus, scalability is not an issue in macroblock placement. Rather more accurate modeling of timing with enhanced optimization techniques (at the cost of runtime, of course) is more important in this placement problem domain. 9. Three-dimensional placement: Traditionally, placement is formulated as a two-dimensional problem that places circuit elements in a 2D plane. Recent advances in package technology, however, enable chips to be piled up and interconnected together so that more functional blocks and logics can be inserted to chip designs [16]. Including more logics is not only a predominant advantage of 3D chip integration but also introduces new technical problems. The primary concern of 3D integration is a thermal issue. Because multiple circuit planes are stacked up tightly, it is more difficult to dissipate heat, particularly in the middle planes. Hot spots of integrated circuits in modern technology have adverse impacts on circuit performance because temperature directly affects the subthreshold voltage of transistors, resulting in slower responses and signal propagations. Heat dissipation can be addressed via thermal gradient consideration during floorplanning/placement[17] or inserting thermal vias to facilitate heat flow [18]. Another principal concern of 3D integration is the signal propagation among different circuit planes. The signal delay from one plane to another is an order-of-magnitude higher than that of the same plane. Because the circuit performance, Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Placement: Introduction/Problem Formulation Finals Page 285 24-9-2008 #10 285 i.e., maximum frequency of a design, is limited by the maximum delay of timing critical paths in a design, reducing plane-to-plane signal delay is an important issue in 3D placement. Therefore, partitioning and placement of logics for planes have consequential effects on the final performance of design and plane-to-plane interconnect needs to be considered during placement. These unforeseen concerns of 3D integration require a new formulation of 3D placement and open new opportunities for placement research. In addition to these issues, more and more technology constraints are being considered during placement, such as power/thermal constraints, power/ground network, IR drop constraints, etc. [17, 19–21]. This is because a placement solution can directly affect the final quality of solution of design closure. This trend will carry on as long as technology scaling continues, and it confirms that placement remains the most important problem in design closure. 14.4 GENERAL APPROACHES TO PLACEMENT Placement algorithms are typically based on a simulated annealing, top-down cut-based partitioning, or analytical paradigm (or some combination thereof). Simulated annealing is an iterative optimization method that mimics the physical metal cooling process. With the given objective function, the process tries to achieve a better solution via a set of predefined moves. If a move improves the objective function, it is always accepted. If a move produces a worse solution, it is accepted based on some probability function. At early stages (with high temperature), a bad move has higher chance to get accepted while at later stages of placement (with lower temperature), the probability goes down exponentially. These worse-yet-accepted moves are essential for a simulated annealing placement algorithm to overcome a local optimum solution in which a placement might be stuck. A greedy move-based placement tool cannot escape from this local optimum once it steps into one. The typical set of moves in a simulated annealing placement algorithm are (1) relocation of a circuit element into new position, (2) exchanges of two circuit elements’ locations or (3) mirroring/rotation of a circuit element at the same location, etc. As mentioned earlier, the typical objective function is total wirelength. Recently, other factors such as routability, power, area, and even signal integrity metrics are directly modeled in the objective function of simulated annealing placement tools, because an iterative optimization approach is very flexible to model these nonconventional multidimensional objective functions. However, simulated annealing placement is regarded as a rather slow method compared with other placement algorithms as the design size grows. Chapter 16 provides more detailed discussion of simulated annealing placement algorithms. The advent of flat/multilevel partitioning as a fast and effective algorithm for min-cut partitioning has helped to spawn off a new generation of top-down cut-based placement tools. A placer in this class partitions circuit elements into either two (bisection) or four (quadrisection) regions of the chip, then recursively (following breadth-first search order) partitions each region until a good coarse placement solution is achieved [22]. When each region is partitioned, circuit elements outside the region are assumed to be fixed at the current locations and pseudopins are created around the region under consideration. This is called a terminal propagation [23]. Because the basic algorithm is based on partitioning, the typical objective function is the number of netcuts between subregions. Finding a good partitioning indicates that good logical clustering of circuit elements are found with less communications among them that can lead to a better total wirelength. To speed up the algorithm, multilevel clustering can be combined with a partitioning-based placement. In general, cut-based multilevel partitioning placement can be performed quite well particularly when designs are dense. Also, partitioning-based placement is a relatively fast placement algorithm. Partitioningbased placement algorithms are discussed in Chapter 15, and the fundamental partitioning concept and algorithm itself can be found in Chapter 7. Analytical placement algorithms typically solve a relaxed placement formulation (e.g., minimum total squared wirelength) optimally, allowing cells to temporarily overlap. Legalization is achieved Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 286 Finals Page 286 24-9-2008 #11 Handbook of Algorithms for Physical Design Automation by removing overlaps via either partitioning or by introducing additional forces or constraints to generate a new optimization problem. The formulation of these methods models the mechanical spring network. Each net represents a spring that attracts circuit elements connected to the net. The optimum solution represents the equilibrium state of the given spring network. Analytic placers can perform poorly when the data is naturally degenerate (which occurs when no fixed object exists) because it becomes difficult to legalize a placement where thousands of circuit elements are placed virtually at the same location. Also, analytical methods may have difficulties in dense designs where legalization is forced to significantly alter the analytic solution. The new breed of analytic placement algorithm, dubbed as forced-based placement showed great promise recently. Force-based placement adds additional forces to the formulation that pull circuit elements from high-density regions to low ones. The key point is to achieve a better distributed placement solution by integrating these spreading forces into a formulation, instead of relying on explicit partitioning or other techniques. There are a variety of techniques for cell spreading in force-based analytic placement techniques. During placement, some form of density analysis is performed to calculate spreading forces. Once the spreading forces are determined, these forces can be applied to each circuit element via constant forces, or explicit fixed-point methods. Sometimes, a density (overlapping) penalty function is included in the placement objective function explicitly so that nonlinear optimization can minimize total wirelengths and overlapping simultaneously. In this nonlinear optimization framework, linear wirelength approximation can also be included to minimize half-perimeter bounding box wirelength directly. To speed up the convergence of optimization, circuit clustering techniques can be combined with these analytic placement algorithms. These clustering techniques can not only reduce the runtime of placement but also improve the quality of placement solutions. The general analytical placement algorithm is presented in Chapter 17 and the new force-directed methods are discussed in Chapter 18. Recently, the ISPD (International Symposium on Physical Design) Conference hosted two placement contests in 2005 and 2006 for the academic placement research community. The contests provided a common platform where various placement algorithms can be evaluated on the same set of realistic large-scale ASIC designs. Particularly, the new placement benchmark circuits that were released during the contests have set a new bar for requirements of modern placement capability. By providing a common basis for quantitative measurements of contemporary placement algorithms, researchers were able to publicize their placement tools and results and discuss the pros and cons of different breeds of placement algorithms. For more serious placement researchers, ISPD placement contests can serve as a good starting point for further in-depth discussions of placement algorithms and implementations [11,24]. REFERENCES 1. P. G. Villarrubia, Important placement considerations for modern VLSI chips, invited talk at International Symposium on Physical Design, San Diego, CA, 2000. 2. S. M. Sait and H. Youssef, VLSI Physical Design Automation: Theory and Practice, World Scientific, River Edge, NJ, 1999. 3. M. Sarrafzadeh and C. K. Wong, An Introduction to VLSI Physical Design, McGraw-Hill, NY, 1996. 4. N. A. Sherwani, Algorithms for VLSI Physical Design Automation, Kluwer Academic Publishers, Norwell, MA, 1999. 5. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, NY, 1979. 6. A. B. Kahng, Classical floorplanning harmful? in Proceedings of International Symposium on Physical Design, 2000, San Diego, CA, pp. 207–213. 7. C. Alpert, G. -J. Nam, and P. G. Villarrubia, Effective free space management for cut-based placement via analytical constraint generation, IEEE Transactions on Computer-Aided Design, 22(10): 1343–1353, 2003 (ICCAD 2002). 8. S. N. Adya and I. L. Markov, Combinatorial techniques for mixed-size placement, ACM Transactions on Design Automation of Electronic Systems, 10(5): 2005. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Placement: Introduction/Problem Formulation Finals Page 287 24-9-2008 #12 287 9. D. E. Duate, N. Vijaykrishnan, and M. J. Irwin, A clock power model to evaluate impact of architectural and technology optimization, IEEE Transactions on VLSI, 10(6): 844–855, December 2002. 10. Y. Lu, C. -N. Sze, X. Hong, Q. Zhou, Y. Cai, L. Huang, and J. Hu, Navigating registers in placement for clock network minimization, in Proceedings of Design Automation Conference, Anaheim, CA, 2005, pp. 176–181. 11. G. -J. Nam and J. Cong (Eds.), Modern Circuit Placement: Best Practices and Results, Springer Verlag, NY, 2007. 12. G.-J. Nam, S. Reda, C. J. Alpert, P. G. Villarrubia, and A. B. Kahng, A fast hierarchical quadratic placement algorithm, IEEE Transactions on Computer-Aided Design of Circuits and Systems, 25(4): 678–691, April, 2006 (ISPD 2005). 13. J. Cong and J. Shinnerl (Eds.), Multilevel Optimization in VLSI CAD, Kluwer Academic Publishers, AA Dordrecht, the Netherlands, 2003. 14. S. N. Adya, I. L. Markov, and P. G. Villarrubia, On whitespace and stability in physical synthesis, Integration: The VLSI Journal, 39(4): 340–362, 2006 (ICCAD 2003). 15. C. Alpert, G. -J. Nam, P. G. Villarrubia, and M. Yildiz, Placement stability metrics, in Proceedings of Asia South Pacific Design Automation Conference, Shanghai, China, 2005, pp. 1144–1147. 16. R. Montoye, The four degrees of 3D, invited talk at International Symposium on Physical Design, Phoenix, AZ, 2004. 17. B. Goplen and S. Sapatnekar, Efficient thermal placement of standard cells in 3D ICs using a force directed approach, in Proceedings of International Conference on Computer-Aided Design, 2003, pp. 86–90. 18. J. Cong and Y. Zhang, Thermal via planning for 3-D ICs, in Proceedings of International Conference on Computer-Aided Design, San Jose, CA, 2005, pp. 745–752. 19. Y. Cheon, P. -H. Ho, A. B. Kahng, S. Reda, and Q. Wang, Power-aware placement, in Proceedings of Design Automation Conference, Anaheim, CA, 2005, pp. 795–800. 20. A. B. Kahng, B. Liu, and Q. Wang, Supply voltage degradation aware analytical placement, in Proceedings of International Conference on Computer Design, San Jose, CA, 2005, pp. 437–443. 21. J. Lou and W. Chen, Crosstalk-aware placement, IEEE Design and Test of Computers, 21(1): 24–32, 2004. 22. A. E. Caldwell, A. B. Kahng, and I. L. Markov, Can recursive bisection alone produce routable placement? in Proceedings of Design Automation Conference, Los Angeles, CA, 2000, pp. 477–482. 23. A. E. Dunlop and B. W. Kernighan, A procedure for placement of standard cell VLSI circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits, 4(1): 92–98, 1985. 24. Available at http://www.ispd.cc/contests. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C014 Finals Page 288 24-9-2008 #13 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Finals Page 289 29-9-2008 #2 15 Partitioning-Based Methods Jarrod A. Roy and Igor L. Markov CONTENTS 15.1 Top-Down Partitioning-Based Placement Framework . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.1.1 Terminal Propagation and Inessential Nets . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.1.2 Bipartitioning versus Multiway Partitioning .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.1.3 Cutline Selection and Shifting .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.1.4 Whitespace Allocation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.1.4.1 Free Cell Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.2 Enhancements to the Mincut Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.2.1 Better Results through Additional Partitioning .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.2.2 Fractional Cut . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.2.3 Analytical Constraint Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.2.4 Better Modeling of HPWL by Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.3 Mixed-Size Placement . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.3.1 Floorplacement .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.3.2 PATOMA and PolarBear .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.3.3 Fractional Cut for Mixed-Size Placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.3.4 Mixed-Size Placement in Dragon2006 .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4 Advantages of Mincut Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.1 Flexible Whitespace Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.2 Solving Difficult Instances of Floorplacement . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.2.1 Selective Floorplanning with Macro Clustering . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.2.2 Ad Hoc Look-Ahead Floorplanning . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.3 Optimizing Steiner Wirelength .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.3.1 Pointsets with Multiplicities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.3.2 Performance .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.4 Incremental Placement .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.4.1 General Framework .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.4.4.2 Handling Macros and Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.5 State-of-the-Art Mincut Placers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.5.1 Dragon .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.5.2 FengShui.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.5.3 NTUPlace2 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 15.5.4 Capo . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 290 291 291 291 292 292 293 293 294 294 295 296 296 298 298 299 299 299 300 301 303 303 304 305 305 305 307 307 307 307 307 308 308 Over the years, partitioning-based placement has seen many revisions and enhancements, but the underlying framework illustrated in Figures 15.1 and 15.2 remains much the same. Top-down partitioning-based placement algorithms seek to decompose a given placement instance into smaller instances by subdividing the placement region, assigning modules to subregions, and cutting the 289 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 290 Finals Page 290 29-9-2008 #3 Handbook of Algorithms for Physical Design Automation Variables: queue of placement bins Initialize queue with top-level placement bin 1 While (queue not empty) 2 Dequeue a bin 3 If (bin small enough) 4 Process bin with end-case placer 5 Else 6 Choose a cut-line for the bin 7 Build partitioning hypergraph from netlist and cells contained in the bin 8 Partition the bin into smaller bins (generally via min-cut bisection or quadrisection) 9 Enqueue each child bin FIGURE 15.1 Top-down partitioning-based placement. netlist hypergraph [7,19]. The top-down placement process can be viewed as a sequence of passes where each pass examines all bins and divides some of them into smaller bins. The division step is commonly accomplished with balanced mincut partitioning, which minimizes the number of signal nets connecting modules in multiple regions [7]. These techniques leverage well-understood and scalable algorithms for hypergraph partitioning and typically lead to routable placements [9]. Recent work offers extensions to block placement, large-scale mixed-size placement [15,18,31] and robust incremental placement [33]. 15.1 TOP-DOWN PARTITIONING-BASED PLACEMENT FRAMEWORK Using mincut partitioning in placement was presented by Breuer in 1977 [7]. The underlying framework remains mostly the same and is illustrated in Figures 15.1 and 15.2. The core area is comprised of a series of placement bins which represent (1) a placement region with allowed module locations (sites), (2) a collection of circuit modules to be placed in this region, (3) all signal nets incident to the modules in the region, and (4) fixed cells and pins outside the region that are connected to modules in the region (terminals). 1 2 Placement bin End-case placement etc. 3 4 FIGURE 15.2 The overall process of top-down placement is shown on the left. The placement area and netlist are successively divided into placement bins until the bins are small enough for end-case placement. One important enhancement to top-down placement is terminal propagation, shown on the right. The net in question has five fixed terminals: four above and one below the cutline. It also has movable cells, which are represented by the cell with a dashed outline. The four fixed terminals above the cutline are propagated to the black circle at the top of the bin while the one fixed terminal below the cutline is propagated to the black circle below the cutline. The movable cells remain unpropagated. Note that the net is inessential because terminals are propagated to both sides of the cutline. (From Roy, J. A. and Markov, I. L., IEEE Trans. CAD, 26, 632, 2007.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Partitioning-Based Methods Finals Page 291 29-9-2008 #4 291 Mincut partitioning-based placers proceed by dividing the netlist and placement area into successively smaller pieces until the pieces are small enough to be handled efficiently by optimal end-case placers [11]. State-of-the-art placers generally use a wide range of hypergraph partitioning techniques to best fit partitioning problem size—optimal (branch-and-bound [11]), middle-range (Fiduccia– Mattheyses [20]), and large-scale (multilevel Fiduccia–Mattheyses [10,26]). Mincut placement is highly scalable (due in large part to algorithmic advances in mincut partitioning [10,20,26]) and typically produces routable placements. In this section, we introduce topics relevant to top-down partitioning-based placement that must be addressed by all modern mincut placers. Specifically these include terminal propagation, bipartitioning versus multiway partitioning, cutline selection, and whitespace (or free space) allocation. 15.1.1 TERMINAL PROPAGATION AND INESSENTIAL NETS Proper handling of terminals is essential to the success of top-down placement approaches [11,19, 21,37]. When a placement bin is split into multiple subregions, some of the cells inside may be tightly connected to cells outside of the bin. Ignoring such connections can adversely affect the quality of a placement because they can account for significant amounts of wirelength. On the other hand, these terminals are irrelevant to the classic partitioning formulation as they cannot be freely assigned to partitions. A compromise is possible by using an extended formulation of partitioning with fixed terminals, where the terminals are considered to be fixed in (propagated to) one or more partitions, and assigned zero areas (original areas are ignored). Nets propagated to both partitions in bipartitioning are considered inessential because they will always be cut and can be safely removed from the partitioning instance to improve runtime [11]. Terminal propagation is typically driven by geometric proximity of terminals to subregions/partitions. Figure 15.2 (right) depicts terminal propagation for a net with several fixed terminals. This particular net is inessential for bipartitioning as it has terminals propagated to both sides of the cutline. 15.1.2 BIPARTITIONING VERSUS MULTIWAY PARTITIONING In his seminal work on mincut placement, Breuer introduced two forms of recursive mincut placement: slice/bisection and quadrature [7]. The style of mincut placement most commonly used today has grown from the quadrature technique, which advocated the use of horizontal and vertical cuts; the slice/bisection technique used only horizontal cuts and exhibited worse performance than quadrature [7]. Since that time, horizontal and vertical cutlines have been standard in all placement techniques, but there has been debate as to whether there should be an ordering to the cuts (i.e., horizontally bisect a bin then vertically bisect its children as in quadrature [7]) or both cuts should be done simultaneously as in quadrisection [37]. Quadrisection has been shown to allow for the optimization of techniques other than mincut (such as minimal spanning tree length [21]), but terminal propagation is more complex when splitting a bin into four child bins instead of two. Also, bisection can simulate quadrisection with added flexibility in cutline selection and shifting (see Section 15.1.3) [31]. There are currently no known implementations that use greater than four-way partitioning and the majority of partitioning-based placement techniques involve mincut bipartitioning. 15.1.3 CUTLINE SELECTION AND SHIFTING Breuer studied two types of cutline direction selection techniques and found that alternating cutline directions from layer to layer produced better half-perimeter wirelength (HPWL) than using only horizontal cuts [7]. The authors of Ref. [40] studied this phenomenon further by testing 64 cutline direction sequences. Their experiments did not find that the two cut-sequences that alternate at each layer were the best, but did find that long sequences of cuts in the same direction during placement

Handbook of algorithms for physical design automation part 31

Nội dung