Handbook of algorithms for physical design automation part 41.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 382 Finals Page 382 10-10-2008 #7 Handbook of Algorithms for Physical Design Automation Finally, consider minimization of q(s) in Equation 19.1 at the coarser level under these assumptions 1 and 2, i.e., minm q̃(ec ) ≡ q(s̃ + Pec ) = ec ∈R 1 (s̃ + Pec )T Q(s̃ + Pec ) − bT (s̃ + Pec ) 2 It is easily shown that ec minimizes q̃(ec ) if and only if it satisfies PT QPec = PT r (19.6) Galerkin coarsening thus defines the linear coarsening operator PT : Rn → Rm as the transpose of the linear interpolation operator P, as simple substitution of Equation 19.5 into Equation 19.4 yields Equation 19.6 on premultiplication by PT . By confining coarse-level iterative improvement to the perturbation e = Pec to s̃, relaxation at the coarse-level is in effect decoupled from relaxation at the adjacent finer level. The combined approximation s̃ + e is thus not restricted to the range of P and can represent a broader set of solution candidates for fixed P than is possible if relaxation at the coarser level and subsequent interpolation are applied to the entire coarse-level solution sc rather than just its perturbation ec . In both theory and practice, coarsening and interpolation are so closely related that defining one essentially characterizes the other, even if nonlinear or discrete methods are used. 19.2.2 SIMPLE EXAMPLES OF INTERPOLATION To give the simplified model concreteness for placement, consider the sample netlist H illustrated in Figure 19.4. The coarsening of H on the right-hand side of the figure shows modules 1, 2, and 3 mapped to cluster 0, module 4 mapped to cluster 1, and modules 0 and 5 mapped to cluster 2. For the moment we ignore the algorithm used to define the coarsening and concentrate just on the question of how to place the modules, once a placement of their parent clusters has been computed.∗ Because in this example there are six modules at the finer level and three modules at the adjacent coarse level, linear interpolation operators for this example are represented as constant 6 × 3 matrices. Let Pconst denote the matrix for piecewise-constant linear interpolation, in which each module simply inherits its parent cluster’s position. Alternatively, let Pavg denote the matrix for linear interpolation in which each module is placed at some linear combination of its parent’s position and the positions of other clusters containing modules with which it shares nets; let wij denote the weight that module 1 2 e1 3 e3 e3 4 e2 1 0 1,2,3 4 e4 0 Netlist H e5 5 e2 2 0,5 e4 Coarsened H FIGURE 19.4 Simple six module netlist H with five nets and its coarsened approximation. ∗ For concreteness in this example, we describe finer-level objects as modules and coarser-level objects as clusters, but the example applies to any two adjacent levels of hierarchy—the finer-level objects might also be clusters of still finer-level objects, etc. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 383 10-10-2008 #8 383 Enhancing Placement with Multilevel Techniques (a) Piecewise-constant interpolation B A C (b) Nonpiecewise-constant interpolation FIGURE 19.5 Piecewise constant versus nonpiecewise-constant interpolation (declustering). Each component within a cluster is placed at a weighted average of the locations of all clusters containing other components to which it is connected. i assigns to the position of cluster j, and assume weights are normalized such that wij = 1 if module i is contained by cluster j. With modules indexed 0, 1, 2, …, 5 and clusters indexed 0, 1, 2, as shown in the figure, Pconst and Pavg take the following forms. Pconst ⎛ 0 ⎜1 ⎜ ⎜1 =⎜ ⎜1 ⎜ ⎝0 0 0 0 0 0 1 0 ⎞ 1 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎠ 1 and Pavg ⎛ w0,0 ⎜ 1 ⎜ ⎜ 1 =⎜ ⎜ 1 ⎜ ⎝ 0 0 0 0 w2,1 0 1 w5,4 ⎞ 1 0 ⎟ ⎟ 0 ⎟ ⎟ w3,2 ⎟ ⎟ w4,2 ⎠ 1 Each is applied separately to the xc and yc coordinate vectors of the coarse placement to obtain corresponding x and y coordinate vectors of the fine placement, e.g., x = Pavg xc and y = Pavg yc ∗ (see also Figure 19.5). 19.2.3 STRICT AGGREGATION VERSUS WEIGHTED AGGREGATION A common objection to clustering is that its associations may be incorrect and therefore lead subsequent iterations to the wrong part of the solution space. To reduce the likelihood of poorly chosen clusters, the notion of a cluster can be generalized by weighted aggregation. Rather than assign each cell to just one cluster, we can break it into a small number of weighted fragments and assign the fragments to different coarse-level vertices; these are no longer simple clusters and are instead called aggregates. During interpolation, a cell’s initial, inherited position is then typically determined by that of several aggregates as a weighted average [16]. Clustering, also called strict aggregation, is a special case of weighted aggregation. To our knowledge, weighted aggregation is not currently used by any published placement algorithm. 19.2.4 SCALABILITY The scalability of the multilevel approach is straightforward to obtain and understand. Provided relaxation at each level has order linear in the number Na of aggregates at that level, and the number ∗ Formally, the matrix for the interpolation operator on s = (x, y) ∈ Rn is P = diag(Px , Py ), where Px is the interpolation matrix for the x coordinates, and Py for y. In these examples, Px = Py . Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 384 Finals Page 384 10-10-2008 #9 Handbook of Algorithms for Physical Design Automation of aggregates per level decreases by factor r < 1 at each level of coarsening, say Na (i) = r i N at level i, the total order of a multilevel method is at most cN(1 + r + r 2 + · · · ) = cN/(1 − r). Higherorder (nonlinear) relaxations can still be used, as long as at least one of the following restrictions is enforced. 1. Their use is limited to subsets of bounded size, e.g., by sweeps over overlapping windows of contiguous clusters at the current aggregation level. 2. Starting placement for iterations at each level is good enough that the termination criteria for relaxation at that level can be satisfied after a strictly controlled number of iterations. In general, unstructured global optimization problems, these restrictions might sharply limit an algorithm’s ability to explore the solution space in any scalable way. Placement, however, exhibits sparse local structure, where most variables are related through the objectives and constraints to only a small constant number of other variables. In the presence of such structure, both analysis and results obtained in practice have confirmed that these restrictions need not impair final quality of results [9]. 19.2.5 CONVERGENCE PROPERTIES Consider a Fourier decomposition of the error e = x̂ − x ∗ in a system of equations N(x) = 0 (x ∗ denotes an exact solution, x̂ is a computed approximation to x ∗ ). By localized relaxation, we mean iterative optimization over a small subset of coupled variables, assuming all other variables are held fixed; e.g., Gauss Seidel (Equation 19.3). The fundamental observation is as follows [2]: Observation 1 Localized relaxation tends to rapidly reduce high-frequency components of the error in the system of equations. Lower-frequency components of the error are reduced much more slowly. There are two key consequences of the observation, both of which stem from the fact that the very notion of high-frequency comes from the modeling scale. First, different frequency-range Fourier components of the error are most efficiently targeted by relaxation at different scales. By coarsening the modeling scale, we can target a coarser scale of the error in the system—simply apply the same relaxation algorithm at the coarser scale. Recursion over all scales of relevance produces highly accurate solutions in extremely fast and scalable runtime. Second, both the outcome of relaxation and the convergence rates of individual variables during relaxation can be used to guide the construction of interpolation or aggregation operators. This connection underlies the basic principles of successful multiscale algorithms for systems of linear and nonlinear differential equations, where a robust convergence analysis has been obtained [2]. In this way, analysis and practice in both linear and nonlinear systems of equations have established a link between relaxation and interpolation for problems with local structure [9]. 19.2.5.1 Error-Correcting Algorithm MG/Opt Elementary local convergence properties of a general error-correcting multiscale optimization algorithm have been established by Lewis and Nash [17]. For simplicity, their consideration of constraints is omitted here. The error being corrected (to first-order) is introduced by transferring approximations from finer level to coarser levels. Consider the unconstrained minimization of a smooth nonlinear function F(s) over variables s ∈ Rn within a modeling system which generates multiple resolutions of F and s as specified by subscripts h (finer scale) and H (coarser scale). In placement, a resolution is primarily a selection of bin-grid dimensions used to enforce density constraints. A convergent, continuous nonlinear relaxation algorithm R is assumed given, along with T continuous interpolation operator IHh and continuous coarsening operator IhH = IHh . Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Enhancing Placement with Multilevel Techniques Finals Page 385 10-10-2008 #10 385 The following steps comprise on single iteration (at level h) of a multiscale optimization algorithm called MG/Opt [17]. The algorithm makes explicit use of first-order coarse-grid corrections. Given a resolution min Fh (sh ) sh (19.7) with initial estimate sh(0) . If this (h) is the coarsest resolution level, solve Equation 19.7 to the fullest possibly accuracy by means of relaxation R. Otherwise, apply the following steps. 1. Apply N1 > 0 iterations of R directly to Equation 19.7, obtaining improved estimate sh(1) . Compute corresponding coarse-level estimate sH(1) ≡ IhH sh(1) . 2. Compute coarse-level gradient correction vH = ∇FH (sH,1 ) − IhH ∇Fh (sh,1 ). 3. Using initial point sH,1 , recursively apply this algorithm to the corrected coarse-level problem minsH FH (sH ) − vHT sH to obtain next iterate sH,2 . 4. Interpolate the coarse-grid step sH,2 −sH,1 back to a finer-level search direction eh = IHh (sH,2 − sH,1 ). 5. Perform line search (for scalar α) at level h to obtain next iterate sh,2 = sh,1 + αeh . 6. Apply N2 > 0 iterations of relaxation R to Equation 19.7 with initial point sh,2 to obtain sh,3 . 7. Finally, discrete (noncontinuous) refinement steps may be used to transform sh,3 to next iterate sh(1) . At least one of N1 , N2 must be strictly positive; the other may be positive or zero. The algorithm is designed to be easy to implement from a given flat nonlinear relaxation and the ability to (a) model the problem at multiple resolutions and (b) transfer approximations between those resolutions. It is shown by Lewis and Nash [17] that MG/Opt converges under the given assumptions. In particular, they establish the following facts rigorously. 1. Corrected coarse-level model approximates the fine-level model to first order in eh . 2. Multiscale search direction eh is a descent direction at the fine level: eTh ∇Fh (sh(1) ) < 0. 3. limk→∞ ∇Fh (sh(k) ) = 0, i.e., algorithm MG/Opt converges. 4. Search directions eh is well scaled; i.e., the natural step α = 1 is likely to be accepted close to a solution s∗ . The latter property is necessary for fast convergence. 19.3 MULTISCALE PLACEMENT IN PRACTICE To design and implement a multiscale algorithm, one would like to know what requirements should be imposed on its coarsening and interpolation, relaxation, and iteration flow, and what trade-offs among them can be reasonably expected. In this section, we summarize characteristics of some leading multiscale algorithms and attempt to illustrate the trade-offs they create and the ways they manage them. 19.3.1 CLUSTERING-BASED PRECURSORS Placement by multilevel optimization can be viewed as the natural recursive extension of clusteringbased approaches considered in earlier work. Schuler and Ulrich [18] compared top-down and bottom-up approaches to clustering for linear placement. They observed large speedups compared to flat algorithms. They also observed that balancing cluster sizes (size incorporating both area and connectivity) was important. Mallela and Grover [19] studied clustering as a means of accelerating placement by simulated annealing. They maintained a priority-queue-like structure for cluster candidates. Sechen and Sun [20] employed three levels of clustering in an annealing-based flow. Hur and Lillis [21] used three levels of clustering in linear placement. Cong and Xu [22] studied Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 386 Finals Page 386 10-10-2008 #11 Handbook of Algorithms for Physical Design Automation clustering-based placement, where clustering is first performed based on MFFCs (maximum fanoutfree cones) that consider signal directions and logic dependency. These clusters are then placed for timing optimization using TimberWolf6.0 [23], a well-known simulated-annealing-based placement package at that time. To our knowledge, Ultrafast VPR [24] is the first published work to recursively cluster a circuit model into a hierarchy of models for placement by multiscale optimization. Ultrafast VPR is used to accelerate the annealing-based VPR algorithm (versatile packing, placement and routing [25]) to reduce design times on field-programmable gate arrays (FPGAs) at some expense in placement quality. (FPGA placement quality in Ultrafast VPR is measured by the area used.) 19.3.2 COARSENING An aggregation strategy defines a recursive transformation of the data functions (objectives and constraints) from finer-level to coarser-level representations. These transformations reduce the numbers of variables and constraints but sometimes increase their complexity as information is compressed. Although some results exist for the accuracy of these transformations as approximations to the original problem in the graph context [2], formal results are, to our knowledge, not yet known in the hypergraph setting. For this reason, contemporary convergence criteria in practice rely on heuristics and empirical observations. General clustering algorithms are described in Chapter 7. Typically, clustering algorithms for placement merge tightly connected cells in a way that eliminates as many nets at the adjacent coarser level as possible while satisfying some constraint on variation in cluster areas. A class of coarsening algorithms more general than clustering is described briefly in Section 19.2.3. Important questions for coarsening in practice include the following: 1. How accurately do coarsened objectives and constraints approximate their corresponding finer-level counterparts? What error is incurred? 2. How much coarser than its finer-level source should a coarsened problem be? 3. How much variation in cluster areas should be allowed? 4. How coarse is too coarse? That is, when should recursive coarsening stop? 5. What trade-offs exist between (a) coarsening and relaxation and (b) coarsening and iteration flow? For example, how often can an algorithm afford to revise or completely reconstruct its coarsening hierarchy, and by what means? 6. Should the coarsening model the solution at the finer level, or the change in a given approximation to that solution? Why? The questions are interdependent, and precise answers for placement are not yet known. Leading academic multiscale placers model the full placement problem at coarser levels rather than the change in a given placement as described in Sections 19.2.1 and 19.2.5. Only force directed placement (FDP)/line search directed placement (LSD) changes its coarsening/interpolation operators over its flow (Section 19.3.3). The following classifications of coarsening algorithms for multiscale placement are sometimes used. 1. Connectivity-based versus location-based: Although netlist connectivity is always used to some extent, location-based algorithms also require a given current placement as input and attempt to keep neighboring modules together (see Section 19.3.2.2). 2. Transient versus persistent [26]: Transient clusters appear as part of the inner steps of an algorithm but are not associated with coarse-level variables. For example, clusters are formed in multiscale partitioning algorithms used in top-down bisection-based placement, but they are not separately placed. Persistent clusters, on the other hand, are computed a-priori and are actually placed. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 387 10-10-2008 #12 387 Enhancing Placement with Multilevel Techniques 3. Score-based versus scoreless [27]: In scoreless algorithms, clusters are committed as soon as they are formed. Examples described in Chapter 7 include edge coarsening, heavy-edge matching, and first choice. In score-based algorithms, alternative candidates are assigned scores and iteratively refined. As the refinement proceeds, the clustering scores are updated, eventually only candidates with sufficiently high scores are selected to serve as clusters. Examples described below include Best choice [26], Fine-granularity [28], and Net cluster [27]. Leading multiscale algorithms limit variation of cluster areas at each level of hierarchy. APlace [29] and NTUPlace3 [30] limit cluster areas to at most 1.5 times the target cluster area. FastPlace3 limits its cluster areas to at most 5 times the target. Among connectivity-based algorithms, experiments to date suggest that local-connectivitydriven greedy strategies like first-choice vertex matching [31,32] and best choice [26] may be more effective than global-connectivity-driven approaches like edge-separability clustering (ESC) [33]. How best to define coarse-level netlists without explosive growth in the number and degree of coarsened hyperedges relative to coarsened vertices is particularly challenging [27,28,32]; Sections 19.3.2.3 and 19.3.2.4 describe recently proposed solutions. Although various forms of clustering dominate the recent multiscale placement literature, Rent’s rule and the importance of limiting cutsize also make recursive partitioning an attractive means of generating the multiscale hierarchy [7]. Hybrid algorithms for floorplanning or placement that combine clustering with netlist partioning in some form continue to be developed [7,12,34–38]. 19.3.2.1 Best-Choice Clustering As described in Chapter 7 and briefly summarized here, best-choice clustering [26] is the method used by the leading multiscale tools APlace [29], mPL [39], FastPlace 3.0 [40], and RQL [14]. Best choice is a netlist-based, score-based algorithm typically used in placement for persistent or semipersistent clustering. A graph is defined on the netlist vertices (modules) with each edge weighted by the affinity of the given two vertices. The affinity may represent some weighted combination of complex objectives, such as hypergraph connectivity, spatial proximity, timing delay, area balance, coarse-level hyperedge elimination, etc. The affinities s(i, j) between vertices i and j used by APlace, mPL, FastPlace3, RQL, and NTUPlace3∗ for connectivity-based clustering are all equal to or slightly modified from the basic affinity formula w(e) s(i, j) = e∈Nij ai + aj (19.8) where Nij is the set of nets containing both module i and module j ai and aj are the respective areas of modules i and j w(e) denotes the weight of hyperedge e, typically taken proportional to 1/(|e| − 1) An affinity-ordered priority queue (heap) of the vertices (including clusters and partial clusters) is formed; each vertex in the heap is associated with its nearest neighbor under the given affinity metric. At each step, the pair of vertices (u, v) with the highest affinity is removed from the heap and clustered, if its total area does not violate the maximum-area constraint. For efficiency, a lazy-update strategy is then employed: the affinities of the netlist neighbor vertices of u and v are marked as invalid rather than being immediately updated. Affinities of invalid vertices are updated only when they arrive at the top of the heap. ∗ NTUPlace3 uses first choice rather than best choice. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 388 Finals Page 388 10-10-2008 #13 Handbook of Algorithms for Physical Design Automation Compared with earlier pass-based algorithms lacking a heap-order organization, the priorityqueue formulation consistently improves HPWL of final placements. In experiments reported by the authors [26], final HPWL obtained by the multiscale placer hATP using best choice improves by 4.3 percent over edge coarsening and by 3.2 percent over first choice. Similar improvements have been reported for APlace [41] and mPL [39]. 19.3.2.2 Location-Based Clustering Location-based clustering is also called layout-based clustering or physical clustering. After an initial placement has been obtained as a starting configuration, spatial proximity can be incorporated into the vertex-affinity metric used for clustering [12,16]. A simple three-level illustration is given in Figure 19.6. Earlier versions of mPL [16] incorporated distance as a reciprocal factor in a second V-cycle.∗ FDP [12] uses an (additive) convex combination of connectivity and spatial proximity for its vertex affinity function in hybrid first-choice clustering. Specifically, the affinity score s(i, j) between vertices i and j is defined for a given placement as ⎛ sFDP (i, j) = λ ⎝ e∈Nij ⎞ 1 1 ⎠ + (1 − λ) 1 + |xi − xj | + |yi − yj | − ζ |e| − 1 e∈N ij where Nij is the set of nets containing both vertex i and vertex j |e| is the number of vertices in e ζ specifies the minimum displacement possible between nonoverlapping modules i and j (a) Initial placement at level 0 (b) Proximity-based aggregation (c) (d) Optimization at level 1 (e) Proximity-based aggregation (f) Defines level 1 Defines level 2 FIGURE 19.6 Location-based clustering in a three-level multiscale-placement flow. Given an initial placement at the finest level, clusters at each subsequent coarser level are determined from a combination of netlist connectivity and spatial proximity. ∗ Later versions of mPL, however, abandoned the location-based approach in favor of a single V-cycle derived by purely connectivity-driven best-choice clustering [42]. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 389 10-10-2008 #14 389 Enhancing Placement with Multilevel Techniques The authors report best results for λ ≈ 0.25. The effect stabilizes iterations and supports multiple V-cycles. Location-constrained clustering is used in a somewhat different way to define the coarsest level of a three-level formulation used in FastPlace 3.0 [40]. While the first level of fine-grain clustering (cf. Section 19.3.2.3) relies solely on netlist affinities (Equation 19.8) and best choice, the second coarse-grain clustering imposes the added restriction that vertices must be closer than a certain parametrized limit in order to be clustered. In FastPlace 3.0, that limit is set to 10 percent of the maximum chip dimension. A similar hybrid approach to clustering, using only netlist-connectivity at finer levels but both connectivity and proximity at the coarsest level, is used by mFar [11,43]. The work of Chen et al. [44] describes a more radical approach.∗ A cluster hierarchy is derived from four separate preliminary placements. Several iterations of the successive-overrelaxation (SOR) variant of coordinate-wise relaxation [15] are applied to the linear equations for the optimality of flat, unconstrained quadratic HPWL perturbed by density-balancing forces computed by a Poisson solver [10]. In each of four separate trials, the cells are initially placed all at the same point: one of the four corners of the placement region. Clusters are selected according to cells’ average final proximity over the results of all four trials. Although this iterative, empirical approach to clustering requires significant runtime, it is a fixed initial cost that can be amortized over the cost of subsequent iterations. Numerical results confirm the scalability of this approach. 19.3.2.3 Mutual Contraction and Fine-Granularity Clustering A central obstacle to multilevel placement of netlists is that, in the course of recursive clustering, modules tend to be eliminated far more rapidly than nets. The result is often extremely densely interconnected coarse-level netlists with very different structure from the original finest-level netlists they are intended to approximate. The problem has been partly addressed in the literature by clustering schemes that strive to eliminate as many nets as possible [32], in particular, nets of low degree. To this end, the mutual contraction formulation [28] models the relative strength of a connection between modules u and v relative to u’s total connectivity as wr (u, v) = w (u, v) w (u, x) x where the weight w (e) = 2/{[d(e) − 1]d(e)} comes from clique-model (graph-based) approximations of hyperedges in the netlist. The contraction of two modules is defined as the symmetrized product cp (x, y) = wr (x, y)wr (y, x); a pair of modules with a large contraction relative to other pairs is a good candidate for clustering. The notion is readily extended to arbitrary subsets of modules. In fine-granularity clustering [28], the contraction metric is used to order connections between modules in a priority queue. The modules at the front of the queue are grouped together if their grouping does not violate area-balance or other constraints. A target cluster limit is set a priori; clustering typically stops with most clusters containing two or three modules of average area. 19.3.2.4 Net Cluster The above strategies share a dependence on metrics derived from pairs of vertices but are largely oblivious to netlist structure involving more than two vertices. The net–cluster study [27] shows that aggregation criteria defined over local subsets broader than simple pairs of vertices can be used to improve quality and runtime of existing multiscale placers. In particular, clustering the vertices of a single multipin net is shown in many instances to improve quality of results over what can be ∗ The hierarchy is used by Chen et al. to solve the linear system of equations for a flat, Poisson-based analytical placement formulation rather than to directly support multilevel placement. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 390 Finals Page 390 10-10-2008 #15 Handbook of Algorithms for Physical Design Automation 6 2 4 1 3 5 7 FIGURE 19.7 Netlist with a natural clustering the vertex-pair-based algorithms typically fail to find. The natural clustering most appropriate for placement (or netlist partitioning) groups vertices 1–4 in one cluster and vertices 5–7 in another. Vertex-pair-based algorithms overwhelmingly prefer to cluster vertices 4 and 5, precluding the natural clustering. obtained with pairwise strategy; cf. Figure 19.7. Another advantage of net cluster is that it reduces the number of nets at approximately the same rate as the number of vertices. The net cluster algorithm consists of five distinct steps partitioned into two phases. Phase 1. Candidate cluster (a) identification, (b) refinement, and (c) scoring Phase 2. Net-cluster (a) scoring and (b) final formation Initial cluster candidates for phase 1 are simply the nets in the netlist; note that these are not disjoint. Refinement of the candidates proceeds one by one on each of them. For each one, several iterations of FM bipartitioning (Chapter 7) are performed, the cluster candidate used as one subset of the netlist and the rest of the netlist as the other. That is, this step attempts to determine whether each vertex in a cluster candidate is more strongly connected to other vertices in the cluster or to the vertices external to it. After this refinement step, each candidate cluster Ci is scored by sc (Ci ) = 1 #nets inside the cluster × #modules inside the cluster cluster area That is, the score prefers candidate clusters that (1) absorb many nets, (2) aggregate many modules, and (3) have low area. At the end of phase 1, candidate clusters still are not disjoint. In phase 2, each net Ni is then also assigned a score equal to the sum of the scores of the candidate clusters containing it minus the sum of the scores of the candidate clusters cutting it (see Ref. [27] for the precise form). Nets are then visited in descending score order, with one of the four following possible cases applied to each net. 1. If clustering the cells violates cluster-area constraints, then this net is ignored, and the next net is considered. 2. Otherwise, the cells in the net are clustered if none of them have already been clustered. 3. Otherwise, if just one cell in the net has already been clustered, the net can be merged with an existing cluster, if doing so does not violate cluster-area constraints. 4. Otherwise, if at least two cells in the net have already been assigned to different clusters, a merge of all these overlapping clusters and the cells in the current net is made, if doing so does not violate cluster-area constraints. The net cluster study supports the view that considerable improvement in both runtime and QoR of existing multilevel algorithms may be attainable through a more accurate modeling of local netlist structure during aggregation. On ICCAD2004 test cases of up to approximately 200,000 objects, a single pass of net cluster improves final HPWL of leading multiscale placers mPL6 [39] and NTUPlace3 [30] by 2–5 percent on average. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 391 10-10-2008 #16 391 Enhancing Placement with Multilevel Techniques TABLE 19.1 Approximate Numbers of Clusters Ncoarsest at Coarsest Levels of Leading Academic Multiscale Placers Placer APlace Dragon FastPlace3 FDP/LSD mPL NTUPlace3 Ncoarsest 2000 4 Nfinest /4 1000 500 6000 Note: 19.3.2.5 The finest-level netlist has Nfinest modules. Coarsest Level Because the initial placement at the coarsest level may have a large influence at subsequent iterations, and because the coarsest-level problem is relatively small, the placement at this level is typically performed with great care, to the highest quality possible. For example, mPL [39] uses nonlinear programming while mPG [5] uses simulated annealing. How to judge the coarse-level placement quality is not necesssarily obvious, however, as the coarse-level objective may not correlate strictly with the ultimate fine-level objectives. Under such circumstances, multiple iterations over the entire hierarchical flow may sometimes be important [6,16], evaluation of objectives and constraints at the finest level providing feedback to the representation and optimization at all coarser levels. Coarsest-level problem sizes of some leading multiscale algorithms are shown in Table 19.1. From the table it is evident that an enormous range of coarsest-level problem sizes exists, from just four movable objects in Dragon [7] to N/4 in FastPlace3, where N is the number of modules in the original netlist. The observation that flat (non-multiscale) algorithms, e.g., [13,45] remain competitive suggests that there is no inherent upper limit to best coarsest-level problem size. Similarly, Dragon’s results [7] and results from other problem domains [46,47] strongly suggest the absence of any hard lower bound. The complexity of integer nonconvex nonlinear programming, to which placement belongs, is high enough that multiscaling can sometimes produce superior solutions to flat algorithms even when fewer than 100 variables are present. 19.3.2.6 Numbers of Levels Netlist-driven clustering by variations of best choice or first choice seems to allow a rapid reduction in the number of modeling levels, by allowing a large cluster-size target, without loss of final placement quality. In APlace, each coarse cluster level has about 1/10 the number of movable objects of its adjacent finer level; from Section 19.3.2.5, this rule produces just four levels of hierarchy for APlace on a 1M-module test case. In NTUPlace3, this ratio is approximately 1/5, and in mPL6 it is approximately 1/4. In contrast, in the location-based clustering used by FDP/LSD [12,48], at most 35 percent of modules are aggregated in one pass of clustering, leaving a much larger number of levels. From Section 19.3.2.5, this rule produces 20 levels of hierarchy for FDP/LSD on a 1M-module test case. 19.3.3 ITERATION FLOW Of all the leading multiscale placers cited in this chapter, FDP/LSD [48] is the only one known to rely on more than a single pass of successive refinement from coarsest to finest level (cf. Figure 19.3). After each such pass of successive refinement in FDP/LSD, the entire hierarchy is reconstructed, using location-based clustering (Section 19.3.2.2), with one less level of clustering than used in the preceding pass. On a 1M-module test case, this flow constructs a total of 20 + 19 + · · · + 1 ≈ 200 distinct problem formulations over its history.

Handbook of algorithms for physical design automation part 41

Nội dung