Handbook of algorithms for physical design automation part 33.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 302 Finals Page 302 29-9-2008 #15 Handbook of Algorithms for Physical Design Automation among the obvious facilitators)—instead, certain existing steps in floorplacement are skipped. This improvement is based on two observations: (1) blocks that are much smaller than their bin can be treated like standard cells and (2) the number of blocks that are large relative to the bin size is necessarily limited. For example, there cannot be more than nine blocks with area in excess of 10 percent of a bin’s area [30]. In selective floorplanning, each block is marked as small or large based on a size threshold. Standard cells and small blocks can be clustered, except that clusters containing hard blocks have additional restrictions on their aspect ratios. After successful annealing, only the large blocks are placed, fixed, and considered obstacles. Normal top-down partitioning resumes, and each remaining block will qualify as large at some later point. This way, specific locations are determined when the right level of detail is considered (Figure 15.10). If floorplanning fails during hierarchical placement, the failed bin is merged with its sibling and the merged bin is floorplanned (Figure 15.9). The blocks marked as large in the merged bin include those that exceed the size threshold and also those marked as large in the failed bin (because the failure suggests that those blocks were difficult to pack). After the largest macros are placed, the flow resumes [30]. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Variables: queue of placement partitions Initialize queue with top-level partition While (queue not empty) Dequeue a partition If (partition is not marked as merged ) Perform look-ahead floorplanning on partition If look-ahead floorplanning fails Undo one partition decision Merge partition with sibling Mark new partition as merged and enqueue Else if (partition has large macros or is marked as merged ) Mark large macros for placement after floorplanning Cluster remaining macros into soft macros Cluster std-cells into soft macros Use fixed-outline floorplanner to pack all macros (soft+hard) If fixed-outline floorplanning succeeds Fix large macros and remove sites beneath Else Undo one partition decision Merge partition with sibling Mark new partition as merged and enqueue Else if (partition is small enough and mostly comprised of macros) Process floorplanning on all macros Else if (partition small enough) Process end case std cell placement Else Bipartition netlist of the partition Divide the partition by placing a cut-line Enqueue each child partition FIGURE 15.9 Modified mincut floorplacement flow. Boldfaced lines are new. (From Ng, A. N., Markov, I. L., Aggarwal, R., and Ramachandran, V., ISPD, 2006.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Finals Page 303 29-9-2008 #16 303 Partitioning-Based Methods Size Size Macros Std cells Bin size/time Time Time FIGURE 15.10 The plot on the left illustrates traditional floorplacement. Whenever a floorplanning threshold is reached, all macros in the bin are designated for floorplanning. Then, the floorplacement flow continues down until detailed placement, where the standard cells will be placed. The plot on the right illustrates the SCAMPI flow. Macros are selectively placed at the appropriate levels of hierarchy. (From Ng, A. N., Markov, I. L., Aggarwal, R., and Ramachandran, V., ISPD, 2006.) The proposed technique limits the size of floorplanning instances given to the annealer by a constant and does not require much extra work. However, it introduces an unexpected complexity. The floorplacement framework does not handle fixed obstacles in the core region, and none of the public benchmarks have them. When Capo fixes blocks in a particular bin, it fixes all of them and never needs to floorplan around obstacles. Another complication due to newly introduced fixed obstacles is in cutline selection. Reliable obstacle-evasion and intelligent cutline selection may be required by practical designs, even without selective floorplanning (e.g., to handle prediffused memories, built-in multipliers in FPGAs, etc.). Therefore, they are viewed as independent but synergistic techniques [30]. When satisfying area constraints is difficult, it is very important to increase the priority of area optimization so as to achieve legality [14]. Because of this, the authors of Ref. [30] select the B∗ -tree [13] floorplan representation for its amenability to packed configurations and add obstacle evasion into B∗ -tree evaluation. 15.4.2.2 Ad Hoc Look-Ahead Floorplanning The sum of block areas may significantly underestimate the area required for large blocks. Better estimates are required to improve the robustness of floorplacement and look-ahead area-driven floorplanning appears as a viable approach [30]. SCAMPI performs look-ahead floorplanning to validate solutions produced by the hypergraph partitioner, and check that a resulting partition is packable, within a certain tolerance for failure. Lookahead floorplanning must be fast, so that the amortized runtime overhead of the look-ahead calls is less than the total time saved from discovering bad partitioning solutions. Therefore, look-ahead floorplanning is performed with blocks whose area is larger than 10 percent of the total module area in the bin, and soft blocks containing remaining modules. For speed, the floorplanner is configured to perform area-only packing, and the placer is configured to only perform look-ahead floorplanning on bins with large blocks. Dealing with only the largest blocks is sufficient because floorplanning failures are most often caused by such blocks [30]. 15.4.3 OPTIMIZING STEINER WIRELENGTH Weighted terminal propagation as described in Ref. [15], and summarized in Section 15.2.4, is sufficiently general to account for objectives other than HPWL such as Steiner wirelength (StWL) [32]. StWL is known to correlate with final routed wirelength (rWL) more accurately than HPWL and the authors of Ref. [32] hypothesize that if StWL could be directly optimized during global placement, one may be able to enhance routability and reduce routed wirelength. The points required to calculate w1 for a given net are the terminals on the net plus the center of partition 1. Similarly, the points required to calculate w2 are the terminals plus the center of Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 304 Finals Page 304 29-9-2008 #17 Handbook of Algorithms for Physical Design Automation FIGURE 15.11 Calculating the three costs for weighted terminal propagation with StWL: w1 (left), w2 (middle), and w12 (right). The net has five fixed terminals: four above and one below the proposed cutline. For the traditional HPWL objective, this net would be considered inessential. Note that the structure of the three Steiner trees may be entirely different, which is why w1 , w2 , and w12 are evaluated independently. (From Roy, J. A. and Markov, I. L., IEEE Trans. CAD, 26, 632, 2007.) partition 2. Lastly, the points to calculate w12 are the terminals on the net plus the centers of both partitions. See Figure 15.11 for an example of calculating these three costs. Clearly, the HPWL of the set of points necessary to calculate w12 is at least as large as that of w1 and w2 because it contains an additional point. By the same logic, StWL also satisfies this relationship because RSMT length can only increase with additional points. Because StWL is a valid cost function for these weighted partitioning problems, this is a framework whereby it can be minimized [32]. The simplicity of this framework for minimizing StWL is deceiving. In particular, the propagation of terminal locations to the current placement bin and the removal of inessential nets [11]—standard techniques for HPWL minimization—cannot be used when minimizing StWL. Moving terminal locations drastically changes Steiner-tree construction and can make StWL estimates extremely inaccurate. Nets that are considered inessential in HPWL minimization (where the x- or y-span of terminals, if the cut is vertical or horizontal respectively, contains the x- or y-span of the centers of child bins) are not necessarily inessential when considering StWL because there are many Steiner trees of different lengths that have the same bounding box. Figure 15.11 illustrates a net that is inessential for HPWL minimization but essential for StWL minimization. Not only computing Steiner trees but also traversing all relevant nets to collect all relevant point locations can be very time consuming. Therefore, the main challenge in supporting StWL minimization is to develop efficient data structures and limit additional runtime during placement [32]. 15.4.3.1 Pointsets with Multiplicities Building Steiner trees for each net during partitioning is a computationally expensive task. To keep runtime reasonable when building Steiner trees for partitioning, the authors of Ref. [32] introduce a simple yet highly effective data structure—pointsets with multiplicities. For each net in the hypergraph, two lists are maintained. The first list contains all the unique pin locations on the net that are fixed. A fixed pin can come from sources such as terminals or fixed objects in the core area. The second list contains all the unique pin locations on the net that are movable, that is, all other pins that are not on the fixed list. All points on each list are unique so that redundant points are not given to Steiner evaluators. To do so efficiently, the lists are kept in sorted order. For both lists, in addition to the location of the pin, the number of pins that correspond to a given point is also saved [32]. Maintaining the number of actual pins that correspond to a point in a pointset (the multiplicity of that point) is necessary for efficient update of pin locations during placement. If a pin changes position during placement, the pointsets for the net connected to the pin must be updated. First, the original position of the pin must be removed from the movable point set. As multiple pins can have the same position, especially early in placement, the entire net would need to be traversed to see if any other pins share the same position as the pin that is moving. Multiplicities allow to know Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Partitioning-Based Methods Finals Page 305 29-9-2008 #18 305 this information in constant time. To remove the pin, one performs a binary search on the pointset and decreases the multiplicity of the pin’s position by 1. If this results in the position having a multiplicity of 0, the position can be removed entirely. Insertion of the pin’s new position is similar: first, a binary search is performed on the pointset. If the pin’s position is already present in the pointset, the multiplicity is increased by 1. Otherwise, the position is added in sorted order with a multiplicity of 1. Empirically, building and maintaining the pointset data structures takes less than 1 percent of the runtime of global placement [32]. 15.4.3.2 Performance The authors of Ref. [32] compared three Steiner evaluators in terms of runtime impact and solution quality. They chose the FastSteiner [24] evaluator for global placement based on its reasonable runtime and consistent performance on large nets. Empirical results show the use of FastSteiner leads to a reduction of StWL by 3 percent on average on the IBMv2 benchmarks [42] (with a reduction of routed wirelength up to 7 percent) while using less than 30 percent additional runtime [32]. 15.4.4 INCREMENTAL PLACEMENT To develop a strong incremental placement tool, ECO-system, the authors of Ref. [33] build upon an existing global placement framework and must choose between analytical and top-down. The main considerations include robustness, the handling of movable macros, and fixed obstacles, as well as consistent routability of placements and the handling of density constraints. On the basis of recent empirical evidence [30,32,35], the top-down framework appears a somewhat better choice. However, analytical algorithms can also be integrated into ECO-system when particularly extensive changes are required. ECO-system favorably compares to recent detail placers in runtime and solution quality and fares well in high-level and physical synthesis. 15.4.4.1 General Framework ECO-system can be likened to reverse engineering the mincut placement process. The goal is to reconstruct the internal state of a mincut placer that could have produced the given initial placement. Given this state, one can choose to accept or reject its previous decisions based on their own criteria and build a new placement for the design. If many of the decisions of the placer were good, one can achieve a considerable runtime savings as compared to placement from scratch. If many of the decisions are determined to be bad, one can do no worse in terms of solution quality than placement from scratch. The overall algorithm in the framework of mincut placement is shown in Figure 15.12. An overview of the application of ECO-system to an illegal placement is depicted in Figure 15.13. To rebuild the state of a mincut placer, one must reconstruct a series of cutlines and partitioning solutions efficiently. One must also determine criteria for the acceptability of the derived partitioning and cutline. To extract a cutline and partitioning solution from a given placement bin, all possible cutlines of the bin as well as the partitions they induce must be examined. Starting at one edge of the placement bin (left edge for a vertical cut and bottom edge for a horizontal cut) and moving toward the opposite edge, for each potential cutline encountered, one maintains the cell area on either side of the cutline, the partition induced by the cutline and its netcut. Once a cutline and partitioning have been chosen, they must be evaluated. To evaluate the partitioning, the authors of Ref. [33] use it as input to a Fiduccia–Mattheyses partitioner and see how much it can be improved by a single pass (if the bin is large enough, a multilevel Fiduccia–Mattheyses partitioner can be used). The intuition is that if the constructed partitioning is not worthy of reuse, a single Fiduccia–Mattheyses pass could improve its cut nontrivially. If the Fiduccia–Mattheyses pass improves the cut beyond a certain threshold, the solution is discarded and the entire bin is bisected from scratch. If a partition is accepted by this criterion, one performs a legality test: if the partitioning overfills a child bin, the cutline is discarded and the bin is bisected from scratch. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 306 Finals Page 306 29-9-2008 #19 Handbook of Algorithms for Physical Design Automation Variables: queue of placement bins Initialize queue with top-level placement bin 1 While (queue not empty) 2 Dequeue a bin 3 If (bin not marked to place from scratch) 4 If(bin overfull) 5 Mark bin to place from scratch, break 6 Quickly choose the cut-line which has the smallest net-cut considering cell area balance constraints 7 If(cut-line causes overfull child bin) 8 Mark bin to place from scratch, break 9 Induce partitioning of bin’s cells from cut-line 10 Improve net-cut of partitioning with single pass of Fiduccia-Mattheyses 11 If(% of improvement > threshold) 12 Mark bin to place from scratch, break 13 Create child bins using cut-line and partitioning 14 Enqueue each child bin 15 If(bin marked to place from scratch) 16 If (bin small enough) 17 Process end case 18 Else 19 Bipartition the bin into child bins 20 Mark child bins to place from scratch 21 Enqueue each child bin FIGURE 15.12 Incremental mincut placement. Boldfaced lines 3–15 and 20 are different from traditional mincut placement. (From Roy, J. A. and Markov, I. L., IEEE Trans. CAD, 20, 2173, 2007.) 1 2 3 4 5 6 Original placement Replaced from scratch Overlap Untouched by legalizer FIGURE 15.13 Legalization during mincut placement. Placement bins are subdivided until (i) a bin contains no overlap and is ignored for the remainder of the legalization process or, (ii) the placement contained in the bin is considered too poor to be kept (too many overlaps or does not meet the solution quality requirements) and is replaced from scratch using mincut or analytical techniques. (From Roy, I. A. and Markov, I. L., IEEE Trans. CAD, 20, 2173, 2007.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Partitioning-Based Methods Finals Page 307 29-9-2008 #20 307 Empirically, the runtime of the cutline selection procedure (which includes a single pass of a Fiduccia–Mattheyses partitioner) is much smaller than partitioning from scratch. On large benchmarks, the cutline selection process requires 5 percent of ECO-system runtime time whereas mincut partitioners generally require 50 percent or more of ECO-system runtime. ECO-system as a whole requires approximately 15 percent of original placement runtime. 15.4.4.2 Handling Macros and Obstacles With the addition of macros, the flow of top-down placement usually becomes more complex. The authors of Ref. [33] adopt the style of floorplacement from Refs. [30,31] (see Sections 15.3.1 and 15.4.2). For legalization with macros, a new criterion for floorplanning is added: if a placement bin has nonoverlapping positions for macros (i.e., no macros in the placement bin overlap each other) the macros are placed in exactly their initial positions; if some of the macros overlap, other floorplanning criteria are used to decide. If any of the macros are moved, the placement of all cells and macros in the bin must be discarded and placement and proceeds as described in Ref. [31]. 15.5 STATE-OF-THE-ART MINCUT PLACERS In this section, we present partitioning-based placement techniques that are used in cutting-edge placers. For each placer, we describe its overall flow, how this differs from the generic mincut flow, and how it handles challenges in placement such as fixed obstacles and mixed-size instances. In particular, we describe the techniques used by the placers Dragon [38,39,42], FengShui [4,27], NTUPlace2 [22], and Capo [30–35]. 15.5.1 DRAGON The most recent version of Dragon, Dragon2006 [39], combines mincut bisection with simulated annealing for placement. In its most basic flow, Dragon2006 utilizes recursive bisection with the hMETIS partitioner [26]. Each bin is partitioned multiple times with a feedback mechanism to allow for more accurate terminal propagation (see Section 15.2.1 for more details on placement feedback). Partitioning is followed by simulated annealing on the placement bins where whole bins are swapped with one another to improve HPWL [38,39]. After a number of layers of interleaved partitioning and simulated annealing, each bin contains only a few cells and the partitioning phase terminates. Next, bins are aligned to row structures and cell-based simulated annealing is performed wherein cells are swapped between bins to improve HPWL [38,39]. Lastly, cell overlaps are removed and local detail placement improvements are made. 15.5.2 FENGSHUI FengShui [4,27] is a recursive bisection mincut placer that uses the hMETIS partitioner [26]. FengShui implements the fractional cut technique (see Section 15.2.2) and packs its placements to either side of the placement region, which has a serious affect on the routability of its placements [32]. FengShui also supports mixed-size placement (see Section 15.3.3) 15.5.3 NTUPLACE2 NTUPlace2 [22] is a hybrid placer that uses both mincut partitioning and analytical techniques for standard-cell and mixed-size designs. NTUPlace2 uses repartitioning (see Section 15.2.1), cutline shifting (see Section 15.1.3), and weighted netcut (see Section 15.2.4) [22]. NTUPlace2 uses analytical techniques to aid partitioning, which are different from those in ACG (see Section 15.2.3). Before partitioning calls to the hMETIS partitioner [26], objects in a placement Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 308 Finals Page 308 29-9-2008 #21 Handbook of Algorithms for Physical Design Automation bin are placed by an analytical technique to reduce quadratic wirelength [22]. These objects that are placed far from the proposed cutline are considered fixed in their current locations for the partitioning process. This technique helps to make terminal propagation more exact, and, with the weighted netcut technique, has resulted in very good solution quality [22]. To handle mixed-size placement, macro locations are legalized at each layer. Macros become fixed at different layers of placement according to their size relative to placement bin size. Thus, larger macros are placed earlier in placement [22]. Macros are legalized using a linear programming technique that attempts to minimize the movement of macros during legalization [22]. 15.5.4 CAPO Capo [30–35] is a mincut floorplacer. As such, it implements the floorplacement flow as described in Section 15.3.1 and further improved by SCAMPI (Section 15.4.2) rather than the traditional mincut flow and implicitly handles mixed-size placement and fixed obstacles in the placement area. Capo can use either MLPart [10] or hMETIS [26] for hypergraph partitioning. Whitespace allocation in Capo is done per placement bin: either uniform (see Section 15.1.4), minimum local or safe whitespace allocation (see Section 15.4.1) is chosen based on the bin’s whitespace and userconfigurable options. To improve the quality of results, Capo also implements repartitioning (see Section 15.2.1), placement feedback (see Section 15.2.1), weighted net-cut (see Section 15.2.4), and several whitespace allocation techniques. Capo has also been used to optimize Steiner wirelength in placement (see Section 15.4.3) and can be used for incremental placement (see Section 15.4.4). REFERENCES 1. S. N. Adya and I. L. Markov, Fixed-outline floorplanning: Enabling hierarchical design, IEEE Transactions on VLSI, 11(6) 1120–1135, December 2003 (ICCD 2001, pp. 328–334). 2. S. N. Adya and I. L. Markov, Combinatorial techniques for mixed-size placement, ACM Transactions on Design Automation of Electronic Systems, 10(5), 58–90, January 2005 (ISPD 2002, pp. 12–17). 3. S. N. Adya, I. L. Markov, and P. G. Villarrubia, On whitespace and stability in physical synthesis, Integration: The VLSI Journal, 25(4), 340–362, 2006 (ICCAD 2003, pp. 311–318). 4. A. Agnihotri et al., Fractional cut: Improved recursive bisection placement, ICCAD, San Jose, CA, pp. 307–310, 2003. 5. C. J. Alpert, G. -J. Nam, and P. G. Villarrubia, Effective free space management for cut-based placement via analytical constraint generation, IEEE Transactions on CAD, 22(10), 1343–1353, 2003 (ICCAD 2002, pp. 746–751). 6. U. Brenner and A. Rohe, An effective congestion driven placement framework, IEEE Transactions on CAD, 22(4), pp. 387–394, 2003 (ISPD 2002, pp. 6–11). 7. M. Breuer, Min-cut placement, Journal of Design Automation and Fault Tolerant Computing, 1(4), 343–362, October 1977 (DAC 1977, pp. 284–290). 8. A. E. Caldwell, A. B. Kahng, S. Mantik, I. L. Markov, and A. Zelikovsky, On wirelength estimations for row-based placement, IEEE Transactions on CAD, 18(9), 1265–1278, 1999. 9. A. E. Caldwell, A. B. Kahng, and I. L. Markov, Can recursive bisection alone produce routable placements? DAC, pp. 477–482, Los Angeles, June 2000. 10. A. E. Caldwell, A. B. Kahng, and I. L. Markov, Design and implementation of move-based heuristics for VLSI hypergraph partitioning, ACM Journal of Experimental Algorithms, 5, 2000. 11. A. E. Caldwell, A. B. Kahng, and I. L. Markov, Optimal partitioners and end-case placers for standard-cell layout, IEEE Transactions on CAD, 19(11), 1304–314, 2000 (ISPD 1999, pp. 90–96). 12. A. E. Caldwell, A. B. Kahng, and I. L. Markov, Hierarchical whitespace allocation in top-down placement, IEEE Transactions on CAD, 22(11), 716–724, November 2003. 13. Y. C. Chang et al., B∗ -trees: A new representation for non-slicing floorplans, DAC, Los Angeles, CA, pp. 458–463, 2000. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 Partitioning-Based Methods Finals Page 309 29-9-2008 #22 309 14. T. C. Chen and Y. W. Chang, Modern floorplanning based on fast simulated annealing, ISPD, San Francisco, CA, pp. 104–112, 2005. 15. T. C. Chen, Y. W. Chang, and S. C. Lin, IMF: Interconnect-driven multilevel floorplanning for large-scale building-module designs, ICCAD, San Jose, CA, pp. 159–164, November 2005. 16. J. Cong, G. Nataneli, M. Romesis, and J. Shinnerl, An area-optimality study of floorplanning, ISPD, Phoenix, AZ, pp. 78–83, 2004. 17. J. Cong, M. Romesis, and J. Shinnerl, Fast floorplanning by look-ahead enabled recursive bipartitioning, IEE Transactions on CAD, 25(9), 1719–1732, 2006 (ASPDAC, 2005 pp. 1119–1122). 18. J. Cong, M. Romesis, and J. Shinnerl, Robust mixed-size placement under tight white-space constraints, ICCAD, San Jose, CA, pp. 165–172, 2005. 19. A. E. Dunlop and B. W. Kernighan, A procedure for placement of standard cell VLSI circuits, IEEE Transactions on CAD, 4(1), 92–98, 1985. 20. C. M. Fiduccia and R. M. Mattheyses, A linear time heuristic for improving network partitions, DAC, Washington, D.C., pp. 175–181, 1982. 21. D. J. -H. Huang, and A. B. Kahng, Partitioning-based standard-cell global placement with an exact objective, ISPD, Napa Valley, CA, pp. 18–25, 1997. 22. Z. -W. Jiang et al., NTUPlace2: A hybrid placer using partitioning and analytical techniques, ISPD, San Jose, CA, pp. 215–217, 2006. 23. A. B. Kahng, S. Mantik, and I. L. Markov, Min-max placement for large-scale timing optimization, ISPD, San Diego, CA, pp. 143–148, April 2002. 24. A. B. Kahng, I. I. Mandoiu, and A. Zelikovsky, Highly scalable algorithms for rectilinear and octilinear Steiner trees, ASPDAC, Kitakyushu, Japan, pp. 827–833, 2003. 25. A. B. Kahng and S. Reda, Placement feedback: A concept and method for better min-cut placement, DAC, San Diego, CA, pp. 357–362, 2004. 26. G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, Multilevel hypergraph partitioning: applications in VLSI domain, DAC, Anaheim, CA, pp. 526–629, 1997. 27. A. Khatkhate, C. Li, A. R. Agnihotri, M. C. Yildiz, S. Ono, C. -K. Koh, and P. H. Madden, Recursive bisection based mixed block placement, ISPD, Phoenix, AZ, pp. 84–89, 2004. 28. C. Li, M. Xie, C. K. Koh, J. Cong, and P. H. Madden, Routability-driven placement and white space allocation, ICCAD, San Jose, CA, pp. 394–401, 2004. 29. G. -J. Nam, C. J. Alpert, P. Villarrubia, B. Winter, and M. Yildiz, The ISPD2005 placement contest and benchmark suite, ISPD, San Francisco, CA, pp. 216–220, 2005. 30. A. N. Ng, I. L. Markov, R. Aggarwal, and V. Ramachandran, Solving hard instances of floorplacement, ISPD, San Jose, CA, pp. 170–177, April 2006. 31. J. A. Roy, S. N. Adya, D. A. Papa, and I. L. Markov, Min-cut floorplacement, IEEE Transactions on CAD, 25(7), 1313–1326, 2006 (ICCAD 2004, pp. 550–557). 32. J. A. Roy and I. L. Markov, Seeing the forest and the trees: Steiner wirelength optimization in placement, IEEE Transactions on CAD 26(4), 632–644, 2007 (ISPD 2006, pp. 78–85). 33. J. A. Roy and I. L. Markov, ECO-system: Embracing the change in placement, IEEE Transactions on CAD, 26(12), 2173–2185, 2000 (ASP-DAC 2007, pp. 147–152). 34. J. A. Roy, D. A. Papa, S. N. Adya, H. H. Chan, J. F. Lu, A. N. Ng, and I. L. Markov, Capo: Robust and scalable open-source min-cut floorplacer, ISPD, San Francisco, CA, pp. 224–227, April 2005. 35. J. A. Roy, D. A. Papa, A. N. Ng, and I. L Markov, Satisfying whitespace requirements in top-down placement, ISPD, San Jose, CA, pp. 206–208, April 2006. 36. N. Selvakkumaran and G. Karypis, Theto—A fast, scalable and high-quality partitioning driven placement tool, Technical report, University of Minnesota, 2004. 37. P. R. Suaris and G. Kedem, An algorithm for quadrisection and its application to standard cell placement, IEEE Transactions on Circuits and Systems, 35(3), 294–303, 1988 (ICCAD 1987, pp. 474–477). 38. T. Taghavi, X. Yang, B. -K. Choi, M. Wang, and M. Sarrafzadeh, Dragon2005: Large-scale mixed-size placement tool, ISPD, San Francisco, CA, pp. 245–247, April 2005. 39. T. Taghavi, X. Yang, B. -K. Choi, M. Wang, and M. Sarrafzadeh, Dragon2006: Blockage-aware congestioncontrolling mixed-size placer, ISPD, San Jose, CA, pp. 209–211, April 2006. 40. K. Takahashi, K. Nakajima, M. Terai, and K. Sato, Min-cut placement with global objective functions for large scale sea-of-gates arrays, IEEE Transactions on CAD, 14(4), 434–446, 1995. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C015 310 Finals Page 310 29-9-2008 #23 Handbook of Algorithms for Physical Design Automation 41. E. Wein and J. Benkoski, Hard macros will revolutionize SoC design, EE Times, August 20, 2004. http://www.eetimes.com/news/design/showArticle.jhtml?articleID=26807055. 42. X. Yang, B. K. Choi, and M. Sarrafzadeh, Routability driven white space allocation for fixed-die standardcell placement, IEEE Transactions on CAD, 22(4), 410–419, April 2003 (ISPD 2002, pp. 42–49). 43. M. C. Yildiz and P. H. Madden, Improved cut sequences for partitioning based placement, DAC, Las Vegas, NV, pp. 776–779, 2001. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C016 Finals Page 311 23-9-2008 #2 Using 16 Placement Simulated Annealing William Swartz CONTENTS 16.1 Introduction . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.2 Annealing Schedules . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.3 Simulated Annealing and Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.4 Simulated Annealing Cooling Schedules .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.5 Cost Functions . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.6 Move Strategies . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.7 Multilevel Methods . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.8 Partition-Based Methods . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.9 Genetic Programming . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.10 Parallel Algorithms . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.11 Machine Learning . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 16.12 Future.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 311 312 313 313 318 320 321 322 322 323 323 324 324 16.1 INTRODUCTION Simulated annealing is a technique for finding an optimal or near-optimal solution for combinatorial optimization problems, or problems that have discrete variables. This technique was proposed by Kirkpatrick, Gelatt, and Vecchi in 1983 [1] and has been successfully applied to circuit partitioning, placement, and routing in the physical design of integrated circuits. The goal of a combinatorial optimization algorithm is to find the state of lowest cost (or energy) from a discrete space of admissible configurations S. For each problem, a cost function must be defined that maps each state to a real number denoting its cost. For many problems, the number of possible states grows exponentially with the size of the input. Optimizing becomes the process of searching for the state of lowest cost in a hyper-dimensional space. With a large number of possible states to visit, the brute force method of visiting all configurations becomes impractical. Clearly, we need a search strategy to uncover the lowest cost solution in the jungle of states. For many problems, the states of the configuration space are related. A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. These cases may be solved by either a greedy or a dynamic programming algorithm. In a greedychoice problem, a globally optimal solution can be found by making a locally optimal (greedy) decision. The best choice is made at each moment; at each step, we solve the ramifications of the previous choice. The choice made by a greedy algorithm cannot depend on future decisions or solutions to subproblems. In dynamic programming, a choice is made at each step that may depend on the solutions to the subproblems. 311

Handbook of algorithms for physical design automation part 33

Nội dung