Handbook of algorithms for physical design automation part 46.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 432 Finals Page 432 24-9-2008 #11 Handbook of Algorithms for Physical Design Automation and TNS gain. Because the sensitivity analysis works best when the netweights are updated in small steps from their initial values, it also adds a constant of total change to bound the netweights. The netweight can be computed as Worg (i), Slk(i) > 0 W (i) = (21.16) Slk Worg (i) + α [Slkt − Slk(i)] SW (i) + βSWTNS (i), Slk(i) ≤ 0 where Worg (i) is the original netweight α and β set the bound of netweight changes, and control the balance between WNS and TNS 21.3.2 DYNAMIC NETWEIGHTING Static netweighting computes netweights once and does not update them during TDP. However, wirelengths change during and after placement, and the original timing analysis may not be valid. To overcome this problem, dynamic netweighting methods were proposed to adjust weights during placement based on timing information available at the current placement stage. A simple dynamic netweighting scheme is to run multiple placement and netweighting iterations. This scheme can be applied on any placement and netweighting algorithms. This simple scheme, however, is often hard to converge without careful netweighting assignment. This is the so-called oscillation problem [33]. Weights are assigned by performing timing analysis for some given placement solution at the nth iteration [28]. Critical nets receive higher weights. At next iteration, the lengths of those critical nets are reduced, while the lengths of some noncritical nets may be increased, resulting in a different set of critical and noncritical nets. If a net alternates between critical and noncritical nets, we have an oscillation problem. To mitigate this problem, one needs to either periodically recompute timing during the placement process [13,27] or use historical netweighting information to achieve stability [34,35]. 21.3.2.1 Incremental Timing Analysis To periodically update weights during placement, one needs to recompute timing during placement. One could incrementally update timing like Ref. [2], which only computes the incremental slack caused by wirelength increments using delay sensitivity to wirelength. sk (n) = sk−1 (n) − d k (n) = sk−1 (n) − SLT (n)l (n) (21.17) where sk (n) is the estimated slack for net n at the k step sk−1 (n) is the slack at k − 1 step d k (n) is the delay change on net n SLT is the delay to wirelength sensitivity l(n) is the wirelength increment Using sensitivity analysis can provide a fast estimation for incremental timing analysis. One can also perform a more accurate incremental timing analysis. For example, Ref. [34] uses a star net model for placement and netlist changes. The main advantage of this model is that it can calculate individual delay between the source pin and every sink pin of star net more accurately. From given gate coordinates, the star net node is computed as the center of gravity of all pins of the net, and the lengths of all arcs in x and y directions can be obtained. These lengths are used to compute the equivalent lumped elements as used in the derived electrical model. Note that one normally does not perform a full-blown static timing analysis during placement, which would do false path detection, early–late mode analysis, etc. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 Finals Page 433 24-9-2008 #12 433 Timing-Driven Placement 21.3.2.2 Incremental Net Weighting To make placement stable with updated weights, we can make use of the historical weights, the so-called incremental netweighting. Different from static netweighting, this method relies on iterations to get the appropriate weights and drives the placement engine along that way. There are two such algorithms in published literature. One only makes use of the history data of the previous step, the other uses the previous two steps. In Ref. [35], at each step, it first computes the criticality for a net i as cjk−1 + 1 /2 c = k−1 cj /2 k i if net i is among the 3 percent most critical nets otherwise (21.18) The criticality describes how critical a net tends to be in general. For example, if a net was never critical, its criticality is 0 whereas an always critical net has a criticality of 1. This scheme effectively reduces oscillations of weights. Once the criticality is computed, the netweight then can be updated as wik = wik−1 × (1 + cik ) (21.19) Therefore, the net with criticality 1 will have its weight doubled at every iteration, while noncritical netweights will stay the same. The other netweighting scheme uses the criticality information from the previous two steps [34]. In this approach, the criticality number is simplified to either 1 or 0. Nets on critical paths get 1, while nets on noncritical paths get 0. The netweight is updated as follows: ⎧ k−1 ⎪ ⎪wi + W ⎨ 1 k wi = k−1 w /2 ⎪ i ⎪ ⎩ wik−1 if if if if cik cik cik cik =1 = 0 ∧ cik−1 = 0 ∧ cik−2 = 0 = 0 ∧ cik−1 = 0 ∧ cik−2 = 1 = 0 ∧ cik−1 = 1 (21.20) In this case, the minimum netweight is 1. If the current criticality is 1, its netweight will be increased by W (>1), which determines how fast the weight would increase because of criticality. Using the number of pins of a net to set W is a reasonable choice because delays of nets with high fanouts are usually larger and more likely to be critical. If the current step net criticality is 0, the netweight may change depending on the criticalities of the previous two steps. 21.3.2.3 Placement Implementation Dynamic netweighting algorithms can be applied to most placement algorithms, e.g., partition-based placement [2,36,37], quadratic placement [34], and force-directed placement [35]. The implementation of dynamic netweighting on quadratic and force-directed placements can be straightforward. Because both placement algorithms provide intermediate gate coordinates at each step, it is easy to estimate wire loads and timing based on those gate coordinates. It is also effective to use the incremental netweighting methods such as Equations 21.19 and 21.20 to drive those placement engines because the matrix solvers for those placers usually respond well to weight changes. For pure partitioning-based placement, one can also use similar method, i.e. update weights between each partitioning step [2,36]. However, the timing analysis in general is not as accurate because partitioning-based placement does not assign exact gate coordinates inside a partition. Thus, the weights may not effectively control the partitioning process, which aims at minimizing the number of weighted crossings, but not wirelength directly. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 434 Finals Page 434 24-9-2008 #13 Handbook of Algorithms for Physical Design Automation One can enforce some cutting constraints to the partitioning algorithm, e.g., the maximum number of times a path can be cut during the iterative partitioning steps [38]. For partitioning-based placement, controlling the cut number on paths in addition to weights helps reduce the wirelength on critical nets more efficiently. It is also a dynamic netweighting approach in that it updates the timing criticality during partitioning process and recomputes weight as well. Unlike previous timing analysis methods that recalculate timing based on gate coordinates, it estimates the critical path by the number of cuts a path has been cut during partitioning. Starting from an initial set of most critical nets, it adds some number of critical nets that has been cut to this set. All the critical nets will be limited to be cut only a maximum number of times by setting a higher weight that is equal to the summation of the weights of noncritical nets in a partition. In Ref. [39], the minimization of the maximal path delay problem is formulated in the min–max, top-down partition-based placement for timing optimization. The main technique is the iterative net reweighting. In another work [40], the concept of boosting factors is introduced, which adjusts netweights according to net spans, so that the quadratic wirelength can be reduced. The method skews the netlength distribution produced by a mincut placer so as to decrease the number of long nets, with minimal impact on the overall wirelength. 21.4 NET-CONSTRAINT-BASED APPROACH 21.4.1 NET-CONSTRAINT GENERATION Because interconnect delay is predominately determined by its netlength, a natural choice for controlling delay is through netlength constraint (NLC), which limits the maximum length of a net. The net-constraint-based approach is another popular net-based interface between timing analysis and placement to drive the TDP. The net-constraint approach has several attractive qualities compared to the common netweighting approach. It is not possible to predict the exact timing response to a netweight. Because many nets may have weight changes, there may be conflicts with each other. Sometimes, it is not even certain that the length of a net will be reduced if it is given a higher netweight. Net-constraint approach has more accurate control. The problem then is how to generate a good set of net constraints that are not overly constrained to limit solution space. A common combined flow may be combining netweighting and net constraints, e.g., having netweighting to guide global TDP and net-constraint generation for incremental/iterative improvement. The two main steps of net-constraint-driven placement are 1. To generate an effective set of NLC bounds 2. To create placers that meet, or nearly meet, these bounds The following sections will explore these two net-constraint-driven goals. 21.4.1.1 Generating Effective NLCs Many techniques have been proposed for generating NLCs and many are similar with the approaches for creating netweights. Many of the original methods attempted to create, in a single shot, a set of NLCs, which when met would result in a design that meets timing requirements. More recently, several works have suggested that NLCs should be generated so that the design’s target frequency is incrementally improved. The single-shot approaches are described first. 21.4.1.2 Single-Shot NLC Generation The goal of single-shot NLC generation is to perform a slack budgeting giving timing constraint for each net, which when realized will meet the timing frequency goal. These timing budgets are then used to generate a physical bound for the NLC using silicon process parasitic parameters. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 Finals Page 435 24-9-2008 #14 435 Timing-Driven Placement In Ref. [41], the zero-slack algorithm (ZSA) is proposed. This algorithm computes delay bounds for each net based on a tentative set of connection delays chosen so that all timing requirements are met. ZSA chooses maximal delays bounds so that a delay increase on any net connection would produce a timing violation. Based on the delay upper bounds, the wirelength constraints can be generated. Net-constraint generation is formulated as a LP problem, which maximizes the range of permissible length for each net, subject to the LP constraints that timing requirements are met. Intuitively, ZSA will distribute extra slacks uniformly among connections on that path. After that, slacks are updated on other paths that are affected, and the process is repeated until every connection has zero slack. An improvement is suggested in Ref. [42], where a weighted slack budgeting is performed based on the delay per unit load function. A larger weight is assigned to nets that are more sensitive and the slack distribution is allocated proportionally to the weight. Runtime improvement to slack budgeting using the nonzero slack allocation in intermediate steps is suggested in Ref. [30]. It omits recomputing slacks on connections whose slacks are altered by delay increase on the minimum-slack segment, and thus it converges faster than Ref. [41]. In practice, all slacks converge to near zero in a few iterations. In Ref. [43], the iterative-minmax-PERT [42] procedure is generalized to guarantee the slacks go monotonically to 0. In Refs. [7,44] the delay budgeting problem is formulated as a convex-programming problem with a special structure, thus efficient graph-based algorithm is proposed. It showed an average of 50 percent reduction in NLC violations over the well-known ZSA [41]. In addition, different delay budgeting objective functions are studied and showed that performance improvements can be made without loss of solution quality. In a recent work [45], a new theoretical framework is presented, which unifies several previous timing budget problems including timing budgeting for maximizing total weighted delay relaxation, minimizing maximum relaxation, and min-skew time budget distribution. Dragon [46] uses design hierarchy information to compute NLCs and it is evaluated using an industrial place and route flow. 21.4.1.3 Incremental NLC Generation Some NLC generation heuristics have taken an incremental approach to create NLCs [5,47]. These heuristics are used with incremental or iterative placement techniques. Initially, a loose set of NLC on a subset of nets is created, which may not yield a placement that meets timing requirements. Further iterations refine NLCs, tightening the bounds on nets critical at each iteration, so the slack is incrementally improved. Proponents of this approach argue that it is better than deriving a singleshot NLC set. During an industry design flow, timing constraints are often unmeetable, even if every interconnect length is 0. Furthermore, a set of NLCs that guarantee performance requirements may not be achievable by any placement. An incremental transfer function that uses a LP-based net-constraint generation technique is proposed in Ref. [47]. The technique incrementally generates net constraints and iteratively reduces the length of critical nets by small increments. The goal of this LP-based technique is to derive a set of net constraints that will improve critical path delay dinitial by a small amount, t. The k longest paths, pi with delay di > dgoal are selected, where dgoal = dinitial − t. For each path, pi with delay di , the delay must be reduced by di − dgoal . Because the algorithm begins with an initial placement, the current horizontal and vertical lengths, Bxi and Byi , of bounding box wirelength of each net ni are known. In each iteration, the horizontal and vertical reduction goals, xi and yi , are computed. The objective function is to minimize the total horizontal and vertical wirelength reductions. min : i∈Nets (xi + yi ) (21.21) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 436 Finals Page 436 24-9-2008 #15 Handbook of Algorithms for Physical Design Automation For each path, a constraint is created in the LP. For example, if path p1 is composed of nets n1 , n2 , and n3 , the constraint would be c1x · x1 + c1y · y1 + c2x · x2 + c2y · y2 + c3x · x3 + c3y · y3 < d1 − dgoal (21.22) where c1x and c1y estimate the delay change per unit horizontal and vertical length of net n1 , etc. Additional constraints are imposed on each xi and yi reduction goal xi < p · Bxi yi < p · Byi where p is a parameter (0 < p < 1), usually chosen to start with small value and increased if no solution is found to the LP. Because a net may be shared by more than one path, these constraints may limit the reduction goal of a shared net and force larger improvement goals in other nets. A convex-programming approach to net-constraint generation is employed by Ref. [5]. Similar to the previous approach [47], it enumerates a set of critical paths to be considered and forms a set of linear constraints on the net delay of these paths. Unlike Ref. [47], each path must have an arrival time that is less than the required time. The result is a set of constraints that, if met, will result in zero slack for the paths considered. 21.4.2 NET-CONSTRAINT PLACEMENT Once net constraints are generated, placers must efficiently meet the constraints while generating legal placements and optimizing wirelength. Net-constraint placement algorithms have been proposed for many global and detail placement algorithms. This section explores two global placement approaches: partitioning and force-directed, a several detailed placement approaches. 21.4.2.1 Partition-Based Net-Constraint Placement Several adaptations of the popular partitioning approach to global placement have been made for net-constraint placement [5,6,9,48]. This section examines a mincut-based approach [5] and two analytical partitioning-based approaches [6,9]. A modified mincut partitioning-based net-constraint global placer is presented in Ref. [5]. The placer modifies the common mincut partitioner using cut weights on constrained nets to change their cut cost. The weights are computed at each partitioning iteration based on the estimated netlengths. For each constrained net, the maximum and minimum estimated lengths, maxi and mini , are computed, which are the half perimeter of the smallest bounding box enclosing all the cells in ni in their worst and best assignments to their partition choices. A netweight, wi , is assigned based on a comparison of these estimates to the bound of the net, bi . If bi < mini , then wi = maxcrit is assigned to the net because any increase in the netlength is undesirable. If bi > maxi , wi = 0 because regardless of assignment choices, the net will not exceed its bound. For nets with maxi ≥ bi ≥ mini , the weight is computed as (maxi −bi ) · maxcrit + 0.5 (maxi − mini ) (21.23) The Fiduccia–Mattheyses algorithm [49] is used to make the partition assignments. The algorithm does not guarantee that the net constraints will be met. One of the first net-constraint-based global placers was published in Ref. [6]. Its general flow follows Proud [50], a partitioning placer that uses mathematical programming to determine partition assignments. Net constraints are created using the ZSA [41] discussed in Section 21.4.1.2. To meet the NLCs, an iterative-solving approach is used. At each iteration, a Lagrange multiplier is computed Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 Timing-Driven Placement Finals Page 437 24-9-2008 #16 437 for each net. For each pin of a net, the multiplier is based on the length constraint, the nets current length, the previous pin weight, and the sum of the weights of the other pins of its cell. It should be noted that the other connectivity of a cell is important in computing pin weight. Although most net-constraint partitioning placers model the NLCs directly in the partitioning assignment, a different approach is taken in Ref. [9]. This placer assumes that a preliminary wirelength-driven partitioning assignment has been made already and it uses a LP formulation to make minimal reassignment to meet NLCs. Each net is modeled using a bounding-box formulation. The location of each cell is restricted to lie within the boundaries of its parent partition and a reassignment variable is used to indicate if the cell is moved from its currently assigned partition or the other child partition of its parent. If the reassignment causes area violation, unconstrained cells are reassigned from the over capacity partition to the other child partition of its parent. The placer uses the analytical partitioning flow from Gordian [51]. 21.4.2.2 Force-Directed Net-Constraint Placement A force-directed placer that optimizes for net constraints is presented in Ref. [8]. As with the other net constraint placers, this too builds on a strong wirelength-driven placer, Kraftwerk [35]. Kraftwerk uses a quadratic programming (QP) model to generate cell locations. Net constraints are met by generating a higher netweight for nets that are not meeting their NLCs. The increased weights are allocated to the pins that determine the current boundary of the net. The outer pins, in both the X and Y dimensions, are given higher weights to reduce its length as long as it does not meet its NLC. Another idea presented in this chapter is to constrain the net segment connecting the nets driver to its critical receiver. 21.4.2.3 Net-Constraint-Based Detailed Placement Several net-constraint detailed placement algorithms have been proposed [10,47,52]. In Ref. [10], the ripple-move algorithm from Mongrel [53] is adapted to include the cost of nets that are violating their constraints. In Ref. [52], net-constraint-driven versions of simulated annealing [13,54–56] and Domino [57] are proposed. The change to simulated annealing is a very simple addition to the simulated annealing (SA) cost function which reflects the cost of nets not meeting their NLC. The Domino-transportation cost function is changed and several new techniques to recombine the fractured subcells are proposed. A local-movement approach that employs LP to reduce nets with constraints while minimizing the movement of unconstrained nets is presented in Ref. [47]. The objective function minimizes the squared movement of the center of a net’s bounding box. This approach will create overlaps that must be resolved through a legalization phase that is not net constraint aware. 21.5 PATH (OR TIMING GRAPH)-BASED APPROACH Historically, path-based TDP refers to those algorithms that directly model the timing constraints (which are inherently path-based) during placement. It ensures that all the paths under consideration will meet their timing requirements after placement. The benefit of path-based approach is that it is explicitly timing driven, unlike net-based approaches which are implicitly timing driven by converting timing constraints into netweights or wirelength constraints. The downside of this approach is the complexity of directly modeling timing in placement, as the number of paths may be prohibitive [26]. Except some early works such as simulated annealing [13], enumerating all paths are not widely adopted. To make the problem size small, one can select only the near-critical paths, but even that could still be huge. The potential problem of only selecting a set of critical paths is that some noncritical paths may become critical. A more powerful technique is to embed timing graph (through a built-in simplified version of static timing analyzer) into the TDP formulation. It implicitly considers all topological paths Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 438 Finals Page 438 24-9-2008 #17 Handbook of Algorithms for Physical Design Automation and formulates them into some mathematical programming framework by introducing intermediate auxiliary variables (such as arrival times). It eliminates the need to enumerate/optimize a limited set of paths. The LP-based formulation is popular as the HPWL model can be formulated exactly into an LP framework. To explicitly write down the delay modeling and timing propagation with respect to the cell locations (x,y), simple/linearized models are often used. In this section, we first review the general LP-based formulation (which can easily be extended to handle nonlinear mathematical programming). Then we discuss various techniques such as partitioning-based overlap removal and Lagrangian relaxation to complement the general LP-based formulation. We also discuss the simulated annealing technique for path-based TDP and a recent technique using differential timing analysis. 21.5.1 LP-BASED FORMULATION The general LP-based formulation consists of two sets of variables and constraints: physical and electrical. The physical variables/constraints deal with variables and equations representing cell locations and netlengths (e.g., computed through the HPWL model). The electrical variables/constraints deal with gate and net delay models, arrival time propagation through the critical path method, and constraints that all required arrival times at timing endpoints are met. The objective function may be maximizing either WNS or TNS, or weighted wirelength, etc. 21.5.1.1 Physical Constraints For cell i, its center coordinates (xi , yi ) are the variables of the LP program. For a net ej , let lj , rj , tj , and bj represent its left, right, top, and bottom locations of its bounding box. Let Nj denote the set of cells connected to net ej , then we have lj ≤ xi + pinx (i, j) rj ≥ xi + pinx (i, j) (21.24) tj ≤ yi + piny (i, j) bj ≥ yi + piny (i, j), ∀ i ∈ Nj where pinx (i, j) and piny (i, j) are the pin offsets of cell i for its pin connecting to net ej in horizontal and vertical directions, respectively. The HPWL of net ej is represented by Lj Lj = rj − lj + tj − bj 21.5.1.2 (21.25) Electrical/Timing Constraints Let the gate delay GDelayi (k, o) represent the pin delay from an input pin k to output pin o of cell i. It can be modeled as a linear function of the load capacitance at the output pin and the slope (transition time) at the input pin with a reasonably high degree of accuracy. Similarly, the slope at the output pin of cell i can be described by a linear function. GDelayi (k, o) = a0 + a1 · CLoadi (o) + a2 · Slopei (k) Slopei (o) = b0 + b1 · CLoadi (o) + b2 · Slopei (k) where Slopei (k) is the slope at the input pin k of cell i Slopei (o) is the slope at the output pin o of cell i CLoadi (o) is the capacitance load seen by the output pin o Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 Finals Page 439 24-9-2008 #18 439 Timing-Driven Placement The constants a0 , a1 , a2 , b0 , b1 , and b2 are determined by standard cell library characterizations. These delay and output slope equations can be defined for every feasible signal transition for the cell. The delay for net ej , NDelayj (i1 , o, i2 , k) from output pin o of cell i1 to the input pin k of cell i2 is modeled in the LP using a simplified Elmore model [58] by the following equation: NDelayj (i1 , o, i2 , k) = KD · r · Lj · c · Lj + CLoadi2 (k) 2 (21.26) where r is the unit resistance of the interconnect c is the unit capacitance constant KD is a constant, 0.69 [14] If the resistance and capacitance in the horizontal and vertical directions are not equal, an alternate model can be used that replaces Lj with individual variables for the horizontal and vertical lengths. The arrival time at each pin is modeled through timing propagation and critical path method. Two types of equations are used, the first for input pins and the second for output pins. For input pin k of cell i2 , its arrival time is Arri2 (k) = Arri1 (o) + NDelayj (i1 , o, i2 , k) (21.27) The arrive time at an output pin o of cell i is represented by the LP variable Arri (i, o) and a set constraints, one for each input pin of cell i. Assuming two input pins k1 and k2 for cell i, the equations would be Arri (k1 ) + GDelayi (k1 , o) ≤ Arri (o) (21.28) Arri (k2 ) + GDelayi (k2 , o) ≤ Arri (o) (21.29) Most implementations assume the arrival time at the output of a sequential cell to be 0. Each library cell has a maximum drive strength, limiting the total capacitance the cell can drive. This drive strength limit is incorporated in the LP through length limits on the driven net. This limit is a precomputed constant to the LP formulation. Lj < CMax(ej ) 21.5.1.3 (21.30) Objective Functions The required time at input pin k of sequential cell vi , Req i (k), is a constant input. The negative slack at these timing endpoints is represented by variable Slk i (k) and equations Slk i (k) <= Reqi (k) − Arri (k) (21.31) Slk i (k) ≤ 0 (21.32) The second constraint is needed so that paths are not optimized beyond what is required to meet timing. This constraint can be adapted so that a slight positive margin is created for each path. The path-based TDP can optimize the TNS, i.e., max : i∈sequential Slk i (k) (21.33) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 440 Finals Page 440 24-9-2008 #19 Handbook of Algorithms for Physical Design Automation To optimize the WNS, a variable representing the WNS is introduced, WNS, i.e., WNS < Slk i (k) (21.34) max: WNS (21.35) And the objective function is simply The LP-based objective function can also be a combination of wirelength and slack [11], e.g., min: Lj − α · WNS (21.36) where α is the weight to trade off wirelength and WNS. To summarize, the complete LP formulation for TDP can be written in the following generic term: minimize f (X) subject to AX ≤ D (21.37) where X is the set of variables including gate coordinates and auxiliary variables f (X) is the objective function which can be Equation 21.33, 21.35, or 21.36 AX ≤ D includes all the physical and electrical constraints such as net bounding-box constraints, delay constraints, slack constraints, and other possible additional constraints (such as the center of gravity constraints as in Ref. [11]) 21.5.2 PARTITIONING-BASED OVERLAP REMOVAL The LP-based formulation may create a lot of overlaps. Partitioning-based approach can be used together with LP-based formulation to remove the cell overlaps, as proposed in the original timing graph-based placer Allegro [11]. At each partitioning step, it formulates a LP problem to determine locations of cells. Each partition is divided into two subpartitions, and its cells are sorted based on the LP locations to determine the new partition assignment. The LP model is similar to Section 21.5.1. The objective function is similar to Equation 21.36. The factor α is used to trade off timing optimization versus wirelength. Additional physical constraints includes center-of-gravity constraint and partitionboundary constraint. The center-of-gravity constraint, as shown in Equation 21.38, tries to place the center of gravity of all the gates in the same partition to be in the center of the partition, while the boundary constraints prevent gates being placed outside the partition boundaries. x= mi xi mi (21.38) where x represents the center of the partition in x direction xi is the position of gate i mi is the equivalent mass of gate i, approximated by the gate width 21.5.3 LAGRANGIAN RELAXATION METHOD The number of constraints in the general LP-based formulation in Equation 21.37 can be enormous, even for moderate size circuits. Lagrangian relaxation is a very effective technique to transform the original constrained LP-formulation into a set of unconstrained problems in an iterative manner, e.g., as in Ref. [12]. Although the objective function used in Ref. [12] is the quadratic wirelength, the principle of Lagrangian relaxation method is the same. For the general mathematical programming Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C021 Finals Page 441 24-9-2008 #20 441 Timing-Driven Placement formulation in Equation 21.37, suppose A has m constraint equations. We can define a size-m vector Lagrange multipliers λ and add the nonnegative term λ · (D − AX) to the objective function: maxλ minX f (X) + λ · (D − AX) (21.39) When λ is fixed, minimizing f (X) + λ(D − AX) is an unconstrained mathematical programming problem, which can be solved efficiently. Then the Lagrange multiplier λ will be updated to solve a new unconstrained optimization problem. This process is iterated to obtain the constrained optimal solution. 21.5.4 SIMULATED ANNEALING The simulated annealing is a generic probabilistic algorithm for global optimization. It randomly moves gates, and accepts or rejects the move based on certain cost function. It is very flexible, i.e., it can take any objective function and consider accurate timing models, if needed. In Ref. [13], the simulated annealing algorithm is used for TDP by augmenting the cost function to include path-based timing information. Because efficient runtime of the cost evaluation step is critical in SA, great care has to be taken in implementing the timing cost function. Rather than updating the static timing graph whenever a cell is moved, the approach in Ref. [13] uses an enumerated set of critical paths, Pcritical . During a move cost evaluation, the paths impacted can be directly updated by adding the change in delay for the nets connected moved cells. The SA engine has two loops. The outer loop identifies Pcritical , and the inner loop runs a number of annealing iterations. In each outer loop of the annealing process, Pcritical is chosen as the K most critical paths using Dreyfus method [59]. In the inner loop, the nets impacted by a move will update the slack of paths, and the total timing cost is the sum of the path slacks in Pcritical . When the inner loop finishes, the outer loop updates the critical paths with new gate locations, and continues the inner loop. The simulated annealing cost function is a combination of wirelength cost and timing cost function. 21.5.5 GRAPH-BASED DIFFERENTIAL TIMING A recent work by Chowdary et al. [14] addresses the correlation problem of graph-based placers with final sign-off timers. Rather than modeling and computing delays and arrival times as was presented above, this approach optimizes an initial global placement based on the differences in delays, arrival, and required times at all pins of a circuit, relative to a reference static timing analysis. It terms this approach differential timing analysis [14]. This differential timing analyzer is almost exact in the neighborhood of the reference static timing, including modeling of setup time and latch transparency. It also introduces another improvement to graph timing-based placement. The constants used in the delay and slope Equation 21.26 are only accurate for a range of values of output loads and input slopes. To maintain the validity of the differential timing model, placement changes are limited to a local neighborhood. It then solves several iterations of the LP adjusting model constants and the neighborhood limits in each iteration. Differential timing is optimized using LP. A set of LP equations that parallel the static timing graph equations are used. For example, the delta wirelength can be obtained by Lj = rj − lj + tj − bj − Ljold (21.40) where Ljold is the wirelength of net j in the current placement. The equations for delay, slope, arrival, and slack can be formed similarly [14]. 21.6 ADDITIONAL TECHNIQUES There are many additional TDP algorithms in the literature that do not fall exactly into the previous classifications. As mentioned earlier, net-based and path-based algorithms all have pros and

Handbook of algorithms for physical design automation part 46

Nội dung