Handbook of algorithms for physical design automation part 48.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 452 Finals Page 452 24-9-2008 #7 Handbook of Algorithms for Physical Design Automation interconnect regions as a result of clustering. The third approach is to balance the perimeter-degree of the partitions during the partitioning phase of the multilevel placement. One could use a multiconstraint partitioner to solve this problem of balancing area and perimeter-degree simultaneously. Alternatively, one could satisfy both the constraints in a sequential manner by first balancing the perimeter-degree and then balancing the areas of the partitions. The authors present extensive empirical data to validate their claims about fidelity of perimeterdegree as a simple and effective metric to homogenize interconnection complexity. 22.3 GLOBAL-PLACEMENT CONGESTION IMPROVEMENT There have been several studies on incorporating the congestion metric during global-placement stage of the physical implementation flow. Local wiring has a big impact on final congestion characteristics of a design. Hence historically, it has been difficult to robustly address congestion during the globalplacement stage. There have been major advances in addressing congestion during global placement over the past decade. In this section, we detail some of these approaches. 22.3.1 INCORPORATING CONGESTION ESTIMATION DURING GLOBAL PLACEMENT The placement algorithms need to have a good and fast estimation of wiring requirements if they intend to target congestion as a metric during the placement process. Other chapters in this book detail several of these wiring density estimation approaches [10,19,27,39]. Works [6,29] have proposed to incorporate congestion estimation techniques within partitioning-driven quadratic placers in interesting ways. The authors of Ref. [29] base their work on quadratic placement engine that solves an unconstrained minimization problem, the objective function of which is the squared wirelength of the netlist. Because the quadratic solution in general has many overlapping cells, the overlap is resolved by partitioning the solution and iterating over the partitioned solutions [23]. The quadratic wirelength minimum solution serves to guide a mincut partitioning routine. After each round of partitioning, the cells in a region are constrained to the particular region by introducing a center of gravity constraint for each region for subsequent quadratic minimization formulations. Figure 22.5a illustrates the proposed congestion-driven placement methodology. Before each successive placement, internal route estimation and a region-based global route are performed on each region to estimate routing supplydemand ratios. These ratios are used to influence the placer into growing or shrinking the partitioned regions based on resource demand and supply. The region router estimates the routing demand of wires spanning multiple regions. The region router is implemented using a A∗ algorithm [5] on the region-based graph. Once routing demand is computed, this information is used to grow or shrink the regions of the current placement. Regions with higher routing demand are allocated more white space by growing them. For q regions in a placement stage, a growth matrix G is defined as an (n − q × n − q) diagonal matrix with entry gii equal to the region weight of the independent cell xi . The growth matrix is computed as follows. After a round of quadratic minimization and partitioning, a set of new regions is generated. Congestion analysis based on the router is performed on this new set of regions. The routing cost is divided into two parts: (1) external cost computed using the region Quadratic solver Partition Congestion estimator Region router (a) Expansion Compression (b) FIGURE 22.5 (a) Congestion-driven placement methodology and (b) example of region growth relieving congestion. (From Parakh, P. N., Brown, R. B., and Sakallah, K. A., DAC, 1998.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 453 24-9-2008 #8 453 Congestion-Driven Physical Design router for nets spanning multiple regions and (2) internal cost computed using a line-probe router for intraregion routes. The difference between routing supply and demand of a region determine the extra routing tracks required and hence, the growth required by the region. The reader is referred to the original publication [29] for details on how the growth matrix is embedded in the quadratic formulation of the center-of-gravity constraints. The effect of the growth matrix terms in the quadratic formulation is to effectively grow and shrink the regions based on the growth matrix. Reduction in congestion occurs due to the ability to transform horizontal routes into vertical and vice versa. This is shown in Figure 22.5b, where for a vertically congested region vertical expansion is produced. The work in Ref. [6] also targets a flow to avoid routing congestion during the global placement. The main contribution is to use a fast but reliable way to detect routing criticalities and then use the information effectively in a partitioning-driven placer. The techniques are tested on real-world large industrial design with very good results. The framework for their studies is based on a four-way partitioning (quadrisection)-based quadratic placer, BonnPlace [37]. The placement begins by solving a quadratic wirelength minimum solution as describe in the previous paragraph. This overlapping placement is then used by a quadrisection routine to generate four partitions of the netlist. The objective of the quadrisection is to divide the area of the die into approximately equal regions and assign cells to each region such that density requirements for each region are not violated and the displacement of the cells from their initial quadratic wirelength minimum locations is minimized. Center-of-gravity constraints are then imposed on the cells assigned to their particular regions for subsequent quadratic wirelength minimization. This process is iterated till the number of cells in a region is small enough for detail placement techniques to place. Like the work in Ref. [29], the authors of Ref. [6] also try to relieve congestion during global-placement iterations by allocating more white space to perceived congested areas. However, the mechanism is very different. First, we describe the measurement of congestion of a placement as it appears during the placement algorithm as proposed in Ref. [6]. Given a chip that is partitioned into k × k regions (forming a (k + 1) × (k + 1) placement grid), pin density for each region (for local congestion cost) and a congestion estimation for each edge in the dual graph of the placement grid (for the intraregion congestion cost) is computed. For a fast estimation of a global route, probabilistic global router is used to route the intraregion nets. During a placement level (iteration), the current status of the placement grid is used as the global routing partition. Figure 22.6b shows the routing grid as a dual of the placement grid. Multiterminal nets are split into sets of two-terminal nets using Steiner tree decomposition of the net. For each two point connection in the Steiner tree of a net, probabilistic routing similar to the algorithms proposed in Refs. [27,39] is used. These probabilities are then added over all nets for each routing grid box to give the expected usage p(e). Figure 22.6c shows how probabilities are calculated for two 0.6 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 0.4 0.6 1 1 (a) Placement grid (b) Steiner tree and routing grid 0.2, 0.4, 0.6 vertical edges 1 0.2, 0.4, 0.6 horizontal edges 1 (c) Probabilistic weights FIGURE 22.6 Calculation of routing congestion using probabilistic routing analysis. Given a placement of a chip partitioned into k × k grid shown in (a), the nets are decomposed into two pin connections using Steiner tree decomposition as shown in (b). Each two pin connection is then routed probabilistically as shown in (c). (From Brenner, U. and Rohe, A., ISPD, 2002.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 454 Finals Page 454 24-9-2008 #9 Handbook of Algorithms for Physical Design Automation p(e) connections and then added for each grid box. The fraction cong(e) = cap(e) (cap(e) is the routing capacity) is defined as the estimated congestion for an edge e ∈ E(G) in the dual of the placement grid as shown in Figure 22.6b. To account for routability problems inside a routing grid, the pin density pin – dens(R) inside a region R is used as a second metric. To use the computed congestion data, the authors rely on the popular technique of inflating groups of cells that are deemed to be in a congested region. The size of a congested cell is increased from the normal geometric area of s(c) = x(c) · y(c) to an increased value of s (c) = [1 + b(c)] · s(c), with b(c) ≥ 0. During the partitioning stage, the new sizes of cells are used to satisfy the density requirements such that for set C(R) being partitioned in subsets C(R1 ), . . . , C(R4 ), the condition c∈C(Ri ) s (c) ≤ s(Ri ) is satisfied for i = 1, . . . , 4. The numbers b(c) depend upon an input parameter τ ≥ 0, which is an upper bound on the total increment of b(c) for each placement level. The initial value b(c) for each cell is proportional to pin density of the cell. During placement, if a cell is deemed to be in a congested region R, then b(c) is increased by min{1, 2[cong(ei ) − 1]}. Once the cells in congested regions are inflated, they are spread around using a repartitioning method in the placer, which respects the new inflated sizes of the cells and moves cells out of regions that are too full. Repartitioning is called after normal partitioning stage. It considers 2 × 2 windows of adjacent regions R1 , . . . , R4 and forms a larger region R = R1 ∪, . . . , ∪, R4 . The larger region is then repartitioned to obey new density limits and improve wirelength. This is done for all 2 × 2 windows repeatedly over all the regions till it yields reasonable improvement in wirelength. Because the windows are overlapping, it allows the inflated cells to move away from congested regions, in effect reducing the demand for routing in these regions. The work in Ref. [30] also proposes a white-space allocation (WSA) technique based on congestion estimates during mincut global placement. The techniques are based on the (WSA) technique proposed in Ref. [28] and discussed in Section 22.4.2. The difference is that Ref. [28] proposes to apply the technique after placement as a detail placement optimization, while Ref. [30] uses it during partitioning-based global placement. The framework used for the studies in Ref. [30] is a top-down mincut bipartitioning-based placer. Unlike Ref. [6], which uses a quadrisection-based fixed grid placer, Ref. [30] use a bisection-based variable grid placer in which the cutline is allowed to shift after each round of partitioning. The objective for postpartitioning cutline shifting can be based on equalizing densities or congestion estimates. In Ref. [30], before each round of partitioning, the entire placement region is overlayed on a grid. Congestion maps [39] are built using last updated locations of all pins. When cells are partitioned and their positions are changed, the congestion value for their nets are updated. The routing demands and supplies for either side of the cutline are estimated using the congestion map. Using these estimates, cutline is shifted after partitioning to equalize the ratio of demand to supply on either side of the cutline. This effectively allocates more white space to potentially congested areas without having to artificially change the sizes of individual cells. 22.3.2 STEINER WIRELENGTH OPTIMIZATION DURING GLOBAL PLACEMENT Taking an orthogonal approach to traditional works on congestion-driven placement, the work in Ref. [30] shows that congestion characteristics of a placement can be improved by modifying the objective of the placer to optimize Steiner tree wirelength (StWL) and not the traditional halfperimeter wirelength (HPWL). The contention is that StWL correlates much better with routed wirelength in comparison to HPWL. The placement framework for the study is based on the topdown mincut partitioning algorithm implemented in the well-known Capo placer [1] and is called rooster. Mincut placers generally use either bisection or quadrisection to divide the placement area and the netlist. The objective of the partitioning is to minimize the number of connections between the partitions and also to obey area constraints in the partitions. This is done iteratively till the number of cells in a region is small. When bipartitioning is used, the careful choice of vertical or horizontal cut direction influence wirelength, wirelength ratio between horizontal and vertical components, and also routing congestion in resulting placement solution [36]. Proper handling of terminals [13] is essential to the success of top-down placement approaches. When partitioning a placement bin Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Congestion-Driven Physical Design Finals Page 455 24-9-2008 #10 455 (region) cells inside the bin may connect to cells and fixed points outside the particular bin. These connections need to be properly accounted for by formulating the problem as partitioning cells in a bin with fixed terminals. The terminals are propagated to boundaries of the bin being partitioned and are considered fixed in one or more partitions. Traditional mincut objective during placement does not accurately represent the wirelength objective [35]. The authors of Ref. [35] introduce a new terminal propagation technique that allows the partitioner to better map netcut to HPWL. Each original net can be represented by one or two weighted nets for the partitioning problem, depending upon the configuration of the net’s terminals relative to bin’s center. The weighted netcut formulation better represents the HPWL objective. This formulation is later simplified in Ref. [9]. The work in Ref. [30] extends this formulation to facilitate minimization of wirelength estimates other than HPWL, in particular, StWL minimization. The authors of Ref. [30] show that StWL is a valid cost function for the weighted partitioning problems. However, several assumptions for minimizing traditional HPWL objective do not hold anymore. Moving terminal locations changes Steiner tree construction making StWL estimates inaccurate. Additionally, nets that were considered inessential in HPWL minimization cannot be considered so during StWL minimization because there are many Steiner trees of different lengths having same bounding box unlike the HPWL objective. To limit the addition runtime during placement, an efficient data structure called pointset with multiplicities is proposed to aid fast building and maintaining of Steiner trees for partitioning. At the beginning of the mincut placement, all movable cells are placed at the center of the first placement bin that represents the full core placeable area. When constructing the partitioning instance for a bin, a Steiner tree evaluator is called for each essential net of the bin. The weighted partitioning problem is formed based on the Steiner tree segments. The weighted mincut partitioner runs and produces a solution. A cutline is selected based on the partitioning result, new bins are created and cells are placed at the center of their respective bins. Three different Steiner tree evaluators, batched iterated 1-Steiner (BI1ST) [20], FastSteiner [21] and fast lookup table based wirelength estimation technique (FLUTE) [11] are tested in the flow. In addition, Ref. [30] also suggests using the StWL objective in detail placement techniques. Two types of sliding window optimizers targeting StWL are proposed. The first one exhaustively checks all possible linear orderings of a group of cells in row and the second one uses dynamic programming algorithm for an interleaving optimization like the one proposed in Ref. [17]. 22.3.3 FREE SPACE MANAGEMENT DURING GLOBAL PLACEMENT Traditional works on congestion-driven placement in the context of high-utilization designs have focused on redistributing the available limited white space toward the congested areas. However, for modern system on chip (SoC) design styles with several big intellectual property (IP) blocks, large macrocells, the available free space has increased for dust logic. For such low-utilization designs, in addition to solving the problem of local high-congestion spots, it is also important to ensure that the design is not overspread over the die and also that it has a good natural distribution of cells in the layout. Addressing these concerns improves the overall quality of the physical design in terms of performance of the circuit, wirelength, power consumed, congestion owing to global nets, etc. However, even for low-utilization designs, care must be taken to ensure that every local region of the die has enough routing resources to accomodate routing demand created by cells in that region. Hence, the free space management techniques during global placement need to be cognizant of maintaining enough white space in the local regions while ensuring that the design is not overspread for other placement objectives. Hierarchical WSA during top-down partitioning-based placement flows is discussed in Ref. [7] and implemented in the placer Capo. The available white space is uniformly spread throughout the core region. This ensures that every local region of the die has enough available white space and helps routing. At every level of partitioning during the top-down flow, the tolerances during the partitioning of an individual bin are determined by the white space deterioration factor α. α is determined by the available white space in the bin and the number of partitioning levels away to leaf level in the partitioning tree (approximated by log2 N, where N is the number of cells in the Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 456 Finals Page 456 24-9-2008 #11 Handbook of Algorithms for Physical Design Automation bin). Hierachical WSA allows higher tolerances during partitioning and shifting cutlines after each partitioning to equalize the relative white space in the child bins and to ensure a uniform white space distribution. This simple strategy is very good for high-utilization designs but produces circuits with high wirelength and worse performance for sparse designs. The work in Ref. [4] argues that top-down placement based on recursive bisection with multilevel partitioning does poorly on low-utilization designs and tends to produce excessive wirelength when large amounts of white space are present. On the other hand, analytical placement algorithms have a global view of the placement problem and can better manage large amounts of white space. To address this, analytical constraint generation (ACG) [4] proposes to combine a mincut-based placer with a quadratic placement engine. In ACG, during top-down recursive bisection-based mincut placement flow, the partitioning capacities for each placement bin to be partitioned are generated based on quadratic wirelength minimum placement at that level. At each level during the top-down flow, the quadratic wirelength minimization engine produces a placement subject to the respective placement bin constraints for each cell. For each placement bin to be partitioned, the capacities of each child bin are determined by the center of mass of the quadratic wirelength minimum solution. The tolerances for the partitioning problem are kept fairly rigid. Care is taken to ensure that none of the child partitioning bins overflow their target utilizations. ACG gets the hint from a quadratic wirelength minimum solution on the natural placement of the design and uses powerful multilevel mincut partitioning techniques to produce lower wirelength solutions that are not overspread. The techniques proposed in Refs. [2,3], to handle the issue of sparse designs take a different approach. Their techniques are not dependent on the type of the placer. A black-box placer that uniformly distributes the available white space across the core area is assumed. By preprocessing the netlist, it can be ensured that (1) there is minimum local white space through the core area and (2) better allocation of the remaining white space. The constraint of minimum local white space is required to ensure that local regions in the die do not become highly congested. The technique consists of adding small disconnected free cells to the design in an amount not exceeding the white space that remains after the local white space requirement is satisfied. Because the free cells are disconnected and small, the black-box placer is free to place the cells so as to improve the relevant design objectives. After the placement process is complete, the free cells are removed from the design and the underlying cell sites are empty. This causes high cell density (which respects the minimum local white space requirement) in certain areas, with empty sites occupying the vacant areas of the chip. In addition, the work in Ref. [3] describes a low-overhead implementation of filler cells in a mincut placer, because explicit modeling of free cells in the placement problem impacts the runtime and memory footprint of the placer. In contrast to the ACG work where partitioning capacities for each partitioning problem are determined by quadratic wirelength minimum solution and the partitioning tolerances are kept fairly strict, in Ref. [3], the partitioning tolerances for a low-utilization placement bins are increased sufficiently so that the partitioner can find the optimum mincut solution subject to the capacity and tolerance constraints. The increase in tolerances is done keeping in mind the minimum local white space requirements. Figure 22.7 shows the placement of the same design by different mincut placement approaches as discussed above. 22.4 DETAILED PLACEMENT CONGESTION IMPROVEMENT Congestion-driven detailed placement approaches have been studied extensively in literature. In general, these approaches tend to be more effective compared to congestion-driven global placement, because a relatively more accurate congestion information can be achieved at the detailed placement level. The global-placement step determines rough locations of the majority of the cells, as well as the local area density and pin density. After global placement completes, a pass of congestion estimation can be applied. The congestion distribution of the entire design, or congestion map, is usually the guide of the postglobal-placement congestion-reduction algorithms. The goal of congestion-reduction algorithms is to reduce the congestion in the peak congestion spots, at the cost of increasing the routing demand in noncongested area, thus averaging out the Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 457 24-9-2008 #12 457 Congestion-Driven Physical Design (a) (b) (c) (d) FIGURE 22.7 Ckt4 design from IBM has 74 percent white space. (a) Placement produced by Capo with uniform white space distribution. (b) Another placement produced by Capo free filler cells (not shown in the picture) was added to reduce the placer white space from 74 to 15 percent. This reduces the wirelength from 15.32e6 to 8.77e6. (c) Placement obtained from a mincut placer from IBM with 70 percent target density. (d) Placement obtained by the ACG technique with 70 percent target density. (From Adya, S. N. and Markov, I. L., International Conference on Computer Aided Design (ICCAD), San Jose, CA, 2003; Alpert, C. J., Nam, G. -J., and Villarrubia, P. G., Proceedings of the 2002 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, 2002.) routing demand over the die. For the detailed placement stage, there are typically following three ways to reduce the routing demands in congested area. First, traditional HPWL improvement, either globally or locally, can reduce the routing demand in the congested area by having smaller wires in the design. In this sense, the conventional wirelength minimization is of value for congestion control. This is particularly true for high-utilization designs, where little room can be used for manipulating the available white space of congestion control. Wang et al. [38] illustrate the relationship between total wirelength and congestion, based on the edge overflow congestion model. They observe a strong correlation between wirelength and congestion in experiments. Recent work on mincut placement [22] or analytical placement [28] also suggests that reducing total HPWL helps producing routable placements. Second, in congested regions, replacing cells with accurate routing topology information can yield significant congestion reduction. This type of approaches include cell swapping integrated with Steiner tree routing update in the simulated annealing algorithm [38], cell replacement with an exact congestion objective obtained by global routing or routing estimator [28,30], optimal interleaving with trunk tree decomposition [17], and sliding window optimization with StWL objective [30]. The incremental routing approach in Ref. [38] can be runtime prohibitive because both incremental routing and annealing are computationally expensive. A good control of the working area is paramount for the approach. Usually, the working area grows from a highly congested spot and is subject to some maximum area constraints. If the congestion is mainly caused by entangled routing topologies, the cell replacement approach in Ref. [28] can be very effective for moderate congest reduction. The exploration space, however, is rather limited because fewer cells are touched comparing to other methods. Both approaches failed to address the congestion from global interconnects, nets that pass through the congested area. Accounting for global interconnects in routing congestion reduction is fundamentally a very hard problem for detailed placement. The approach in Ref. [30] uses a fast Steiner tree estimator in mincut placement. It is a promising technique as it improves the StWL at almost no cost on runtime. Third, low-design utilization, or the abundance of whitespace, for many modern ASIC designs allows cell spreading, the most effective way to reduce the routing congestion. By spreading the cells in the core placeable area, the routing resources can be significantly increased. In other words, the average routing demand in the congested area can be reduced to meet the routing resource constraints. Theoretically, if the utilization continues to decrease, the routability goal will be achieved for any given design. In reality, however, spreading the placement inevitably deteriorates the circuit performance and increases the power consumption, because of increased average interconnect length. We will go through the details of several techniques in the following sections: Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 458 Finals Page 458 24-9-2008 #13 Handbook of Algorithms for Physical Design Automation 22.4.1 ROUTER INTEGRATION An early approach on cell replacing with routing information was proposed in Ref. [38]. In this work, the detailed placement step is based on low-temperature simulated annealing. Traditionally, the cost function in simulated annealing is total HPWL, or some form of weighted total HPWL. Here the congestion cost is incorporated into the objective function. To obtain congestion cost, a fast Steiner tree router is called for every single move in simulated annealing algorithm. The routing supply/demand information for all the region edges need to be incrementally updated. This approach is computationally expensive because incremental routing and congestion update are much more time consuming than simple bounding-box computation. The advantage is on the fidelity side—congestion improvement achieved in placement can be mostly carried on to the routing. Several techniques can be used to speed up this approach and make it practically feasible, such as routing estimation or lazy update on nets with large fanouts. Using appropriate sizes of the congested regions in optimization is another key step for this approach. The idea is to avoid applying the expensive computation on noncongested areas. The modeling of congestion cost greatly affects the final placement quality. One basic question is should we penalize the moves only when the routing demand is higher than the supply, or we start giving some cost once the demand is close to the supply, i.e., look-ahead approach. The authors tried many different congestion modeling functions and suggest that lookahead congestion cost combined with total wirelength is most effective. A typical problem in congestion minimization is that reducing the peak congestion of one region is often at the cost of increasing the congestion of its surrounding area. Classic congestion removal approaches work in expanded area, which is the peak congestion region plus the surrounding less congested area. Furthermore, these approaches work on one expanded area at a time. This results in conflicting congested areas. The authors in Refs. [42,43] suggest conflict resolving techniques before starting congestion optimization. The basic idea is to exploit the flexibility when growing the peak congestion spot to an expanded area, and obtain the best combined expanded areas for multiple peak congestion spots. The problem can be formulated as a linear program or an integer linear program. A simpler heuristic is to start from greedy expansion and gradually adjust the expanded area to avoid conflicts. Jariwala and Lillis [17] proposed a detailed placement method to integrate trunk decompositionbased routing into optimal interleaving algorithm. Optimal interleaving is a detailed placement wirelength minimization technique proposed in Ref. [17]. It is a powerful intrarow optimization technique. The basic idea is to collect a group of cells in a single-row window, partition them into two groups, and optimally interleave two groups with the same cell order of each group. A dynamic programming algorithm ensures the optimal wirelength can be found in O(n2 ) time complexity. In Ref. [17], the routing information is considered in the interleaving process. The routes associated with cells are also interleaved and the number of nets crossing a channel is taken into account. The algorithm is successful on field programmable gate array (FPGA) benchmarks. It reduces the number of congested routing channels (defined as the channels with maximum routing density) by 45 percent on average. 22.4.2 WHITESPACE MANAGEMENT Because modern cell-based designs usually have utilizations lower than their precedences, recent research work has been focused on congestion improvement in the context of whitespace. Early research on whitespace handling techniques include Ref. [7], in which the whitespace control is used primarily for improving the quality of mincut partitioning, and Ref. [29], which implicitly uses whitespace for area expansion. Yang et al. [40,41] proposed a WSA approach for the purpose of reducing congestion. The experiment flow in this chapter is complete, including global/detailed routing with a widely used commercial router, on a set of benchmarks based on a 0.18 µm library. After global-placement stage, the chip area is divided into rectangular areas (bins) and the congestion estimation is conducted to obtain a congestion degree for each bin. At this point, all the bins have the same amount of whitespace. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 459 24-9-2008 #14 459 Congestion-Driven Physical Design Next, a two-step whitespace reallocation is performed to adjust the amount of whitespace, or cell utilization in each bin. This is accomplished by first computing the desired whitespace distribution based on the current congestion map, and then moving cells between adjacent bins to meet the whitespace requirement. Moving cells for meeting the target whitespace distribution inevitably degrades the quality of the placement. A wirelength improvement stage is introduced to recover the loss. During this low-temperature annealing stage, the latest WSA map is maintained. Cell moving/swapping are allowed only if they do not violate the whitespace requirements by a certain amount. The key factor that determines the performance of this approach is the conversion from congestion map to target whitespace map. Because of the lack of detailed congestion information and uncertainty in cell movement, it is extremely hard to accurately estimate how much whitespace needs to be injected into a bin to make a congested bin routable. What this approach is able to do is to relieve the congestion hot spot to some extent. In reality, because congestion often conflicts with other placement objectives (timing, wirelength, etc.), neither underdoing nor overdoing of congestion removal is desired. Multiple runs with different scaling factor will help to achieve the best trade-off. Another uncertainty of this approach lies in the convergence issue. The target whitespace map is derived from the initial congestion map. However, after cell movement the congestion map will be changed and the target WSA is out of sync, i.e., a previous noncongested spot may pop up with less whitespace allocated. Performing iterations of this approach multiple times may help, but it could cause oscillation problem or considerable loss of quality of the input placement. An enhancement of the above approach was proposed in Ref. [16]. In this work, the congestion estimation and WSA are similar to that of Ref. [40,41]. A flow-based algorithm called placement migration is devised to guide the cell movement for achieving the whitespace map. This algorithm can minimize the amount of movement (total moving distance of all the moved cells), given an original whitespace map (uniform distribution) and a target whitespace map. Less cell movement means smaller perturbation from the input placement and usually lower wirelength loss. Another interesting approach named whitespace allocation is proposed by Li et al. [28]. The algorithm starts with a global placement and the derived congestion map. The placement is then modified using a recursive bipartitioning, or slicing tree method. In a preprocessing step, a congestion degree number is computed for every cell in the design, based on current placement congestion map. In the first step, the placement is recursively partitioned until every region contains a small number of cells. For each partitioning step, the cut direction is determined by the aspect ratio of the region to be partitioned. Each cutline geometrically bisects a region evenly and tries to maintain a square shape of the regions. A slicing tree is constructed completely at the end of recursive bipartitioning. Each node of the slicing tree corresponds to a region that is bipartitioned. The left child and right child of the node corresponds to the two subregions after partitioning. A list of cells is saved in each node, representing all the cells placed inside the region. Next, the slicing tree is evaluated in a bottom-up fashion. A congestion level is given to each node of the tree. The congestion level of a node is simply the summation of the congestion level of its two children. For a leaf node, the congestion level is the summation of the congestion degree of all the cells in the end region. A cutline adjustment step is then performed. The cutlines in the nodes are shifted by traversing the tree in a top-down fashion. For each node, the amounts of whitespace allocated to the two child nodes are linearly proportional to their congestion levels. Consider a region r with lower-left corner (x0 , y0 ), upper-right corner (x1 , y1 ), and the original vertical cut direction at xcut = (x0 + x1 )/2. The area of this region is Ar = (x1 − x0 )(y1 − y0 ). Assume that the total area of cells for left subregion r0 and right subregion r1 are S0 and S1 , and corresponding congestion levels are OVL0 and OVL1 , respectively. The total amount of whitespace, (Ar − S0 − S1 ), is to be allocated into two subregions such that the amounts of whitespace in the two subregions are linearly proportional to their congestion levels (Figure 22.8). Thus, the amount of whitespace allocated to subregion r0 is r0 = (Ar − S0 − S1 ) OVL0 OVL0 + OVL1 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 460 Finals Page 460 24-9-2008 #15 Handbook of Algorithms for Physical Design Automation X0 r0 Xcut S0 r1 X1 r 0⬘ X0 X ⬘cut S0 S1 r 1⬘ X1 S1 Ar Ar FIGURE 22.8 Subregion r0 is more congested than subregion r1 . Cutline is shifted from xcut to xcut , resulting more whitespace in subregion r0 . Then the new cutline location xcut can be derived as follows: γ = 0 S0 + (Ar − S0 − S1 ) OVLOVL +OVL 0 1 Ar xcut = γ x1 + (1 − γ ) x0 where γ is the ratio of the left subregion area to Ar after the cutline adjustment. This step is similar to a top-down partitioning-based global placement except that the cut direction, the cut location, and subnetlists are all known. The cells stay at the center of the region to which they belong during the top-down flow. The placement is then legalized with minimum perturbation and a cell-swapping-based detailed placement is performed. Li et al. showed that this step alone can considerably reduce the postrouting congestion (measured by percentage of overflowed bins) by 72 percent on average. Furthermore, the similar approach was adopted in Ref. [30] for improving congestion by improving the routed total wirelength. The difference is that in Ref. [30] the congestion-estimate-based cutline adjustment is performed during the global-placement stage, which is a top-down partitioning framework. The limitation of these whitespace adjustment approaches is they can do little on the part of congestion that is originated from global interconnects. If we consider a congested region on the globally routed layout, the congestion of the area can be classified into three types: local nets, the interconnects within the region; semiglobal nets, the interconnects coming in/out the region; and the global nets, the interconnects passing through the region. WSA can relieve the congestion for local or semiglobal nets. However, it cannot reduce the routing demands coming from global nets. Therefore, the effectiveness of the WSA approaches is greatly dependent on the composition of these different types of congestion sources. Generally, the proportions of the congestion sources are determined by two factors: the netlist structure and the global-placement quality. It is thus desirable to make the netlist structure more congestion friendly (reducing global nets at no cost of increasing local nets) or improve the global-placement quality (reducing the total wirelength). In summary, congestion reduction in detailed placement can be very effective because the congestion spots have been detected with high fidelity after global placement. If enough whitespace is given, most congestion can be relieved by intelligently spreading the cells to achieve a routable design. However, other objectives in placement, such as timing and power, are likely to be degraded during the spreading. Also, more whitespace means less density and more silicon area. In that sense, congestion reduction should not be addressed only in detailed placement. It is an objective throughout the entire placement flow and should be considered as early as possible, even in floorplanning stage. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 461 24-9-2008 #16 461 Congestion-Driven Physical Design 22.5 SIMULATED ANNEALING FOR CONGESTION IMPROVEMENT Simulated annealing as an effective placement algorithm has been studied for over two decades [25,32,33]. As circuit sizes increase, simulated annealing is no longer a good choice as an globalplacement algorithm because of its weak scalability. However, during detailed placement stage, annealing is still an attractive approach in many cases. For instance, simulated annealing can be applied on a particular region to further optimize the local placement. One flexibility of annealing approach is that it can incorporate virtually any cost into its objective function, as long as the cost can be modeled and incrementally updated. It is then natural to extend simulated annealing from optimizing classic HPWL to optimizing congestion. A number of publications have presented methods to apply simulated annealing for congestion alleviation. They vary in congestion modeling and objective cost function. In this section, we review several cost functions to represent congestion. Once the right cost function is selected, both low temperature or greedy algorithm can be used as the optimization framework. 22.5.1 RISA Cheng proposed a routability model named RISA [10] and for the first time integrated the routability model in simulated annealing algorithm. The basic idea of RISA can be explained using Figure 22.9. The entire chip is divided into M × N rectangular regions. R represents one such region. There are two nets whose bounding boxes overlap with region R. Let WR (HR ) be the width (height) of region R, W1 (H1 ) be the width (height) of net N1 s bounding box, w1 (h1 ) be the width (height) of overlapping box between R and net N1 s bounding box. For region R, the horizontal and vertical congestion contributed by net N1 are w1 h1 H1 WR w1 h1 Cv = q W1 HR Ch = q where q is the netweight that is a function of net pin count. q is obtained by randomly building optimal Steiner tree within net’s bounding box and statistically deriving the probability of net crossings. Table 22.1 shows the precomputed value for q. Congestion cost computed by RISA model is integrated into simulated annealing algorithm. Experiments show that with RISA model, routing congestion as measured by number of overcongested grids is considerably reduced. The runtime cost is about 2X comparing to the original simulated annealing algorithm. W1 N1 N2 w1 H1 h1 HR R WR FIGURE 22.9 RISA routability model.

Handbook of algorithms for physical design automation part 48

Nội dung