Handbook of algorithms for physical design automation part 20.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 172 Finals Page 172 24-9-2008 #13 Handbook of Algorithms for Physical Design Automation 1 1 2 3 2 5 1 3 5 4 54V3H 2V1H M1 5 4 54V2H 3V1H M3 1 3 4 3 2 5 2 4 54V 23H V 1H M2 54H 23H V 1H FIGURE 9.6 Illustration of three types of moves. in the rest of this chapter. It employs the technique of simulated annealing [17], and uses the set of normalized Polish expressions as the solution space to avoid an unnecessarily large number of states and thus enables the speedup of the search procedure significantly. The following three types of moves, M1, M2, and M3, are used to modify a normalized Polish expression to get a neighboring one. M1: Swap two adjacent modules. (Two modules are adjacent iff there is no other module between them in the expression.) M2: Complement a chain of cuts. (A chain of cuts is a sequence of consecutive elements in the expression such that each element is a cut [i.e., V or H]. V and H are complements of each other.) M3: Swap adjacent module and cut. (A module and a cut are adjacent iff they are consecutive elements in the expression.) It is clear that M1 and M2 each always produce a normalized Polish expression. M3, however, might not produce a normalized Polish expression. To generate an M3 move, a pair of adjacent module and cut is repeatedly chosen until swapping them will lead to a normalized Polish expression. It is claimed that the three types of moves are sufficient to ensure that any normalized Polish expression can be reached from any other via a finite number of moves. Figure 9.6 gives a demonstration of the three types of moves. Each normalized Polish expression generated in the annealing process will be evaluated as follows. Let Tf denote the corresponding slicing structure for a normalized Polish expression f . The area A and the total wirelength W of f are defined to be the area and the total wirelength of a minimumarea floorplan of Tf . The cost function for measuring the quality of a normalized Polish expression is A + λW , where λ is a user-specified constant to control the relative importance of A and W . The minimum-area floorplan can be computed efficiently by the slicing floorplan area optimization algorithms [3–5]. In fact, it is observed that the calculation of a minimum-area floorplan can be done in an incremental manner because the calculation only needs to be performed on tree nodes that are changed after a move of type M1, M2, or M3. 9.6 SLICING FLOORPLAN DESIGN CONSIDERING PLACEMENT CONSTRAINTS In floorplan design, it is useful if a designer is allowed to specify some placement constraints to be satisfied in the final floorplan. Typical placement constraints that have been addressed for slicing floorplans are boundary constraints [18–20], range constraints [21], abutment constraints [22], and clustering constraints [23]. In this section, we describe each type of placement constraint, and highlight existing techniques to handle it during floorplan design. As a matter of fact, most of the techniques are extensions of the Wong–Liu algorithm [2]. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 Finals Page 173 24-9-2008 #14 173 Slicing Floorplans 9.6.1 BOUNDARY CONSTRAINTS A boundary constraint forces some modules to be positioned along one of the four sides of a floorplan. It is particularly useful when a designer wants to place some modules along the boundaries for shorter input–output connections. Besides, floorplan design is usually done in a hierarchical manner in which modules are grouped into different units and the floorplan of each unit is independently determined. For this case, it helps if some modules are constrained to be placed along a boundary of the unit so that they are closer to other modules in neighboring units. Boundary constraints can be specified as follows. The set of all modules is divided into five disjoint module sets MF , ML , MR , MT , MB . Each module in MF is a free module that can be placed anywhere in a floorplan. On the other hand, each module in ML (MR , MT , MB , respectively) is a boundary-constrained module with the left (right, top, bottom, respectively) boundary constraint and has to be placed along the left (right, top, bottom, respectively) boundary of the floorplan. Note that ML (MR , MT , MB , respectively) may be an empty set if no module must be placed along the left (right, top, bottom, respectively) boundary. For example, for the two slicing floorplans shown in Figure 9.7a and b, if module 2 is constrained to be placed along the left boundary, then the one in Figure 9.7a is infeasible while the one in Figure 9.7b is feasible. A result on the boundary-constrained problem is reported by Young and Wong in Ref. [18]. They enhance the Wong–Liu algorithm [2] to handle boundary constraints. Their main idea is to check the normalized Polish expression in each iteration of the simulated annealing process to see whether the given boundary constraints are satisfied. Then their algorithm fixes the violated constraints (if any) as much as possible, and includes a term in the cost function to penalize the remaining violations. A linear-time method is used to find the boundary information for each module in a normalized Polish expression. The boundary information of a module tells whether there are modules on the left of, on the right of, above, and below the module. Because a Polish expression or a slicing tree has the relative position information among modules (e.g., the Polish expression ijH means that module i is below module j, while the expression ijV means that module i is on the left of module j), the following facts can be observed. If module i must be placed on the right (left, respectively) boundary of the final floorplan, i cannot be in the left (right, respectively) subtree of any internal node labeled with V . On the other hand, if module i must be placed at the top (bottom, respectively) boundary of the final floorplan, i cannot be in the left (right, respectively) subtree of any internal node labeled with H. On the basis of these facts, the boundary information of each module can be obtained by scanning the Polish expression once from right to left and by using a stack. Once the boundary information is known, all the modules violating their boundary constraints can be determined, and the algorithm fixes as many violations as possible by shuffling the modules. If module i is not placed along the required boundary, it will be shuffled with another module j which is closest to i in the Polish expression and is placed at a position satisfying the boundary constraint of i. This fixing procedure takes O(mn) for each expression, where m is the number of constrained modules and n is the total number of modules. If there are still some constraints that cannot be satisfied after all possible shufflings, a penalty term is included in the cost function. The 4 4 3 1 3 2 2 1 (a) (b) FIGURE 9.7 Illustration of infeasible and feasible floorplans with a boundary constraint. (a) An infeasible floorplan where module 2 cannot be placed along the left boundary and (b) a feasible floorplan where module 2 can be placed along the left boundary. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 174 Finals Page 174 24-9-2008 #15 Handbook of Algorithms for Physical Design Automation V A (a) B A (R ) B (L ) H A (b) B B (L ) A (R ) H V B (c) A B (L ) A (R ) B A A (R ) B (L ) (d) FIGURE 9.8 Examples of feasible and infeasible topologies. (a) A floorplan having an infeasible topology, (b) a floorplan having a feasible topology, obtained by performing an O1 operation on the root of the tree in (a), (c) a floorplan having a feasible topology, obtained by performing an O2 operation on the root of the tree in (a), and (d) a floorplan having a feasible topology, obtained by performing an O3 operation on the root of the tree in (a). penalty term is measured by the total distance of the modules from the boundaries of the floorplan along which they are required to be placed. Because the above-mentioned algorithm [18] also adopts the same set of moves as the Wong–Liu algorithm (i.e., M1, M2, and M3 described in Section 9.5.3) to generate a neighboring normalized Polish expression, it is very likely that only a subset of modules has changed the boundary information in the new normalized Polish expression, and therefore only the boundary information for those modules needs to be recomputed. On the basis of this observation, three speedup methods capable of performing incremental calculation of boundary information are given in Ref. [19]. A drawback with the algorithm in Ref. [18] is that the shuffling method may not always resolve all constraint violations even though a penalty term is added to the cost function to account for those violations. This has been empirically confirmed in Ref. [20], implying that the algorithm in Ref. [18] cannot guarantee that a floorplan satisfying all given boundary constraints is always obtainable unless the annealing process is long enough. To cope with this difficulty, Liu et al. developed a quadratictime method that transforms a normalized Polish expression with constraint violations into another one with all violations eliminated [20]. The main idea is to examine each internal node of a slicing tree (constructed from a normalized Polish expression) in a bottom-up fashion and determine if the node has a feasible topology or not. (An internal node has a feasible topology if it is feasible to place each boundary-constrained module along the required boundary in its corresponding subfloorplan.) If the node has an infeasible topology, the tree will be modified such that the node ends up with a feasible topology. Three operations, O1 , O2 , and O3 , are given to perform modifications. An O1 operation changes the cut direction of a node, an O2 operation swaps the left and the right subtrees of a node, and an O3 operation performs an O1 operation followed by an O2 operation. For example, suppose subfloorplan A contains modules having the right boundary constraint (denoted by R), subfloorplan B contains modules having the left boundary constraint (denoted by L), and both A and B have feasible topologies. Figure 9.8a produces a floorplan having an infeasible topology, but the floorplans in Figure 9.8b through d all have feasible topologies. It is clear that Figure 9.8b through d can be transformed from Figure 9.8a by performing an O2 , an O1 , and an O3 operation on the root, respectively. Although in most cases the three basic operations can transform nodes into ones having feasible topologies, there are also some cases where they fail. Besides, even if an internal node has a feasible topology, the tree needs to be modified as well for some cases to ensure that the root has a feasible topology later on. To handle these difficult cases, additional transformation operations are provided [20]. 9.6.2 RANGE CONSTRAINTS A range constraint forces a module to be placed within a given rectangular region in the final floorplan. It is less restrictive than a preplaced constraint, which requires a module to be placed at a fixed position in the final floorplan.* In fact, the range constraint problem is a more general problem because any preplaced constraint for a module can be written as a range constraint by specifying * Note that the problem of floorplan design with obstacles can be solved by treating the obstacles as preplaced modules. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 Finals Page 175 24-9-2008 #16 175 Slicing Floorplans 5 Y y2 Y 6 5 4 3 y 2⫺h h 1 4 2 y1+ h 2 1 (a) 0 2 3 4 5 w y1 3 1 right(A) = x 1 + w left(A) = x 2⫺w top(A) = y 1 + h bottom(A) = y 2⫺h A X 6 X (b) 0 x1 x1+ w x 2⫺w x 2 FIGURE 9.9 (a) Preplaced and range constraints. (b) Range constraint representation. the rectangular region whose dimensions are the same as those of the module. Figure 9.9a gives an example where module 1 with a preplaced constraint must be placed with its lower left corner at (3, 2) and module 2 with a range constraint must be placed within the dotted-line region. The floorplan shown in Figure 9.9a satisfies both constraints. The preplaced constraint problem for slicing floorplans is considered in Ref. [24] which extends the Wong–Liu algorithm [2] by using the notion of reference point to construct shape curves in the presence of preplaced constraints. Young and Wong also present a slicing floorplanner with range constraints in Ref. [21], and empirically observe that when their floorplanner is specified to handle preplaced constraints, it outperforms the one in Ref. [24]. Therefore in this subsection we only introduce the algorithm in Ref. [21]. In the range constraint problem, two sets of modules, MF and MRange , are given, where each module in MF does not have any placement constraint and each module in MRange is a hard module and has a specified range constraint. The algorithm in Ref. [21] extends the Wong–Liu algorithm [2] to handle range constraints. The main contribution is a novel shape curve computation, which takes range constraints into consideration. When vertically or horizontally combining two modules, if at least one of them has a range constraint, the resultant subfloorplan will also have a range constraint. Therefore, the range constraint information will be propagated upward from the leaves to the root during the bottom-up shape curve construction process, and both the dimensional information, that is, the height and the width, and the range constraint information, need to be kept. Let A be a subfloorplan (containing one or more modules) with a range constraint, the following four variables are used to represent the constraint: • • • • Top(A): Shortest distance of the upper boundary of A from the x-axis Bottom(A): Longest distance of the lower boundary of A from the x-axis Right(A): Shortest distance of the right boundary of A from the y-axis Left(A): Longest distance of the left boundary of A from the y-axis We use Figure 9.9b to explain the four variables. In Figure 9.9b, A has width w and height h, and it is constrained to be placed inside the rectangle {(x, y)|x1 ≤ x ≤ x2 , y1 ≤ y ≤ y2 }. Then top(A) = y1 + h, bottom(A) = y2 − h, right(A) = x1 + w, and left(A) = x2 − w. With a range constraint represented in such a manner, the traditional shape curve construction methods [3–5] are enhanced to compute the range constraint and dimensional information of a subfloorplan X from those of its two children subfloorplans A and B. In addition, if a normalized Polish expression does not satisfy all range constraints, the algorithm adds into the cost function a penalty term, which is measured by the total distance of the modules having range constraints from their desired regions. 9.6.3 ABUTMENT CONSTRAINTS In foorplan design, a designer may want to have some modules abut one another to favor the transmission of data among those modules. This abutment problem is common in practice, but few Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 176 Finals Page 176 24-9-2008 #17 Handbook of Algorithms for Physical Design Automation wi wj hi i j hj FIGURE 9.10 Horizontal abutment. floorplanning algorithms can handle it. In this subsection, we describe a solution [22] to this problem for slicing floorplans. Although the floorplanning algorithm given in Ref. [22] is capable of handling L-shaped and T-shaped modules as well, a discussion of this is beyond the scope of this chapter. Two modules i and j are said to abut horizontally (Figure9.10), denoted by Habut(i, j), iff a vertical boundary Li of module i and a vertical boundary Lj of module j abut such that Li is immediately on the left of Lj and the abutment length is min{l(Li ), l(Lj )}, where l(Li ) is the length of Li and l(Lj ) is the length of Lj . The vertical abutment constraint Vabut(i, j) can be defined similarly. Abutment constraints can be also generalized to involve more than two modules. The algorithm extends the Wong–Liu algorithm [2] to handle abutment constraints. In each step of the simulated annealing process, through the use of a stack, a normalized Polish expression is scanned once to find the top, bottom, left, and right neighbors of every module. These topological relationships are independent of the dimensions of the modules, and can be derived based on the following observation. Let L[X], R[X], T [X], and B[X] denote the set of modules lying along the left boundary, right boundary, top boundary, and bottom boundary of a subfloorplan X. Consider putting X to the left of Y to get a new subfloorplan. If both R[X] and L[Y ] have more than one module, the top module in R[X] will abut horizontally with the top module in L[Y ] and the bottom module in R[X] will abut horizontally with the bottom module in L[Y ]. On the other hand, if either R[X] or L[Y ] has only one module, every module in R[X] will abut horizontally with every module in L[Y ]. The vertical neighborhood relationship can be observed similarly. Once the neighbors of each module are known, each abutment constraint can be checked. If the normalized Polish expression does not satisfy all abutment constraints, the algorithm will swap modules to satisfy the abutment constraints as much as possible. When a vertical abutment constraint for modules i and j, Vabut(i, j), is violated, the algorithm will first try to move j to the top of i by swapping j with the closest top neighbor of i in the Polish expression. If it fails, for example, all the top neighbors of i are fixed in their positions, the algorithm will try to move i to the bottom of j by swapping i with the closest bottom neighbor of j. The fixing procedure for a horizontal abutment constraint is defined similarly. It is likely that some constraints are still violated after all the possible swappings, and therefore a penalty term is added in the cost function to penalize those violations. Scanning a Polish expression once to find the neighbors of every module takes O(n) time, and swapping modules to fix violated abutment constraints takes O(mn) time, where n is total number of modules and m is the total number of abutment constraints. 9.6.4 CLUSTERING CONSTRAINTS A clustering constraint is to enforce some modules to be placed geometrically adjacent to each other in the final floorplan. By imposing the clustering constraint on modules that are heavily connected, Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 Finals Page 177 24-9-2008 #18 177 Slicing Floorplans 1 2 5 6 7 8 10 9 11 3 4 FIGURE 9.11 Feasible floorplan having three modules involved in a clustering constraint. the routing cost among those modules can be reduced. In this subsection, we describe a recent work on slicing floorplan design considering clustering constraints [23]. A clustering constraint can be specified by a subset of modules such that all the modules in this subset are geometrically adjacent to each other in the final floorplan. Figure 9.11 shows an example of the clustering constraint, where modules 4, 5, and 8 are the modules to be clustered and they are placed adjacent to each other in the feasible floorplan as shown. In Ref. [23], an extension of the Wong–Liu algorithm [2] is presented to handle the clustering constraint problem. The extension includes a linear-time method to locate neighbors of a module from a normalized Polish expression and a method to swap the modules to satisfy the given clustering constraint. In each iteration of the annealing process, these two methods are used to transform a normalized Polish expression into another one that satisfies the given cluster constraint. A term measured by the sum of the center-tocenter distances between the modules with the same cluster constraint is added to the cost function such that those modules can be placed more closely. 9.7 OTHER ADVANCES IN SLICING FLOORPLANS We next address two important advances, one in the theoretical study on area-optimal slicing floorplans [25], and the other in the completeness of the slicing tree representation for general floorplans [26]. We then describe the respective algorithms [27,28] that make modern heterogeneous FPGAs and 3D ICs realizable in slicing floorplans. 9.7.1 THEORETICAL RESULTS FOR AREA-OPTIMAL SLICING FLOORPLANS One possible concern about slicing floorplans is that even the slicing floorplan with optimal area may still be not good at packing modules tightly as a nonslicing one, and hence may introduce a larger chip area. Although, there is empirical evidence showing that comparable slicing floorplan results can be obtained using much less runtime [18], it is important to have a mathematical analysis to guarantee their performance. In Ref. [25], the following area-optimal slicing floorplan design problem is addressed. Given n soft modules each with the same shape flexibility r(≥1), what is the minimum area among all possible slicing floorplans? Note that a module is said to have shape flexibility r iff the module can be represented by any rectangle with the same area as long as the aspect ratio of the rectangle is between 1/r and r. It√is proved that if r ≥ 2, then there exists a slicing floorplan F such that area(F) ≤ min{1 + (1/ r ), 5/4, (1 + √θ )}Atotal, where Atotal is the total area of all the modules, Amax is the maximum module area, and θ = 2Amax /(rAtotal ). The shape of such√ a slicing floorplan closely resembles a square. The first term in the upper bound, that is, (1 + (1/ r )Atotal , favors larger r; for example, if r = 25, the first term is 1.2Atotal . The second term gives a better bound than the first one if r ≤ 16. The third term considers the ratios of the module areas and gives a good bound when all the module areas are small as compared with the total module area; for example, if r = 2 and Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 178 Finals Page 178 24-9-2008 #19 Handbook of Algorithms for Physical Design Automation Amax = Atotal /100, the third term is 1.1Atotal . In fact, the experimental results reported in Ref. [25] show that by applying the Wong–Liu slicing floorplanner [2] to more than 20 test cases each with 100 soft modules of shape flexibility 2, slicing floorplans with areas smaller than the above-mentioned mathematical bound can be produced. To get each term in the upper bound, a slicing floorplan is constructed such that its area will meet the bound. Each construction uses a different method to classify modules into groups based on module areas, and modules in different groups are represented by rectangles of different widths (while modules in the same group have the same width). Then modules are placed one at a time from the one with largest area to the one with smallest area by putting a module on the lowest possible level and moving it to the leftmost position on that level. In addition, when r is slightly less than 2, that is, r = 2 − ε with ε being a small positive number, it can be also proved that there exists a slicing floorplan whose area is upper bounded by min{(5/4)[1 + (ε/2)], [1 + θ + (ε/2)]}Atotal . 9.7.2 COMPLETENESS OF SLICING TREE REPRESENTATION Although it can be mathematically proved that a slicing floorplan capable of packing modules closely is achievable, there are constraints (e.g., the same shape flexibility for modules) that must be satisfied beforehand. On the other hand, it is still commonly believed that an area-optimal slicing floorplan may still suffer from a poor utilization of space when all modules are hard. For this reason, many efforts have been devoted for creating nonslicing floorplan representations. These representations are effective and efficient for handling hard modules, but many of them are still unable to fully exploit the shape flexibility of soft modules. In Ref. [26], it is proved that when augmented with a simple compaction procedure, the slicing tree representation can generate all maximally compact placements of modules. (Note that no module in a maximally compact placement (or called admissible placement in Ref. [29]) can be moved horizontally to the left or vertically downward without moving any other modules.) As a result, slicing tree is a complete representation of both slicing and nonslicing floorplans. We now describe the idea behind the proof. Given a slicing tree, its corresponding slicing placement is defined to be the area-optimal floorplan such that no vertical cutlines can be moved to the left, no horizontal cutlines can be moved downward, and each module is placed in the lower left corner of a basic rectangle. For example, Figure 9.12f shows a slicing placement of the slicing tree in Figure 9.12d. A horizontal adjacency graph G = (V , E) can be constructed from a placement of modules. The set V of vertices corresponds to the set of modules. There is an edge (u, v) in the edge set E iff the left boundary of v is immediately adjacent to the right boundary of u. It is clear that G is a DAG. A vertex u in G is said to be a left-boundary vertex iff u is a module placed along the left boundary of the placement. Clearly all left-boundary vertices have in-degree 0. In general the converse may not be true, but it is true for maximally compact placements. Another key fact is that for a maximally compact placement, all vertices in G are connected to the set of left-boundary vertices. Given any maximally compact placement P, let GP be the horizontal adjacency graph of P, and B = {b1 , b2 , . . . , bk } be the set of left-boundary vertices in G. Because every vertex in G is reachable from at least one vertex in B, a spanning forest Q = {T1 , T2 , . . . , Tk } of G can be found, where Ti is a tree rooted at bi , i = 1, 2, . . . , k. For example, Figure 9.12a gives a maximally compact placement of nine modules and Figure 9.12b shows the horizontal adjacency graph, where vertices 7, 4, 1 are left-boundary vertices. Figure 9.12c shows a spanning forest containing three trees rooted at 7, 4, 1, respectively. Now for the trees rooted at b1 , b2 , . . . , bk , k − 1 horizontal cutlines dividing a floorplan into k parts are constructed. For each tree, the transformations shown in Figure 9.13 are then recursively applied to its child subtrees to further expand the slicing structure. Figure 9.12d and e show the final slicing tree and floorplan. From the slicing floorplan, the corresponding slicing placement P can Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 Finals Page 179 24-9-2008 #20 179 Slicing Floorplans 8 8 7 7 9 4 5 4 2 5 6 1 3 2 (a) 4 6 1 3 9 7 5 6 1 8 9 3 2 (b) (c) H V V V 1 7 H H 4 2 3 6 V 8 2 1 9 5 4 V 5 7 8 7 8 4 5 1 2 6 6 3 9 (d) 9 (e) 3 (f) FIGURE 9.12 Generating a slicing placement from a maximally compact placement. (a) A maximally compact placement of 9 modules, (b) the horizontal adjacency graph of (a), (c) a spanning foust of (b), (d) the slicing tree derived from (c), (e) the slicing floorplan derived from (c), and (f) the slicing placement derived from (d) (or (e)). be constructed (Figure 9.12f). Because the positions of the modules in the x-direction in P are the same as those in P, performing a compaction along y-direction, which moves modules downward as far as possible, transforms P into P. This concludes that slicing tree is a complete representation of floorplans in the sense that by an augmenting compaction step, all maximally compact placements can be produced as well. 9.7.3 HETEROGENEOUS FPGA FLOORPLANNING Modern FPGAs can accommodate multimillion gates and their future generations will be even more complex. As a result, a hierarchical approach based upon partitioning and floorplanning becomes necessary to successfully realize a design on an FPGA. Owing to the heterogeneous logic and routing resources on a modern FPGA, FPGA floorplanning is very different from floorplanning for application-specific integrated circuits (ASICs). Although there are also some previous works, for Tm Tm Z T2 Z (a) Z Z (b) T Z T T1 Z T2 T1 (c) FIGURE 9.13 Three types of transformations for expanding a slicing structure. (a) Type 1 transformation, applied when Z has no child, (b) Type 2 transformation, applied when Z has only one child, and (c) Type 3 transformation, applied when Z has two or more children. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 180 Finals Page 180 24-9-2008 #21 Handbook of Algorithms for Physical Design Automation V 5 4 H H 6 V V 3 6 2 1 3 1 (a) CLB RAM Multiplier 2 (b) 4 5 (c) FIGURE 9.14 (a) Slicing floorplan. (b) Slicing tree. (c) Pattern. example, Ref. [30], on FPGA floorplanning, they all target at older generations of FPGAs consisting of configurable logic blocks (CLBs) only. In this subsection, we consider modern heterogeneous FPGA chips that consist of columns of CLBs, with column pairs of RAMs and multipliers interleaved between them (Figure 9.14a). The Xilinx Spartan3 family and Vertex-II family conform to this architecture [31]. For such an FPGA architecture, the first slicing floorplan design algorithm is given in Ref. [27]. The algorithm uses slicing trees to represent floorplans during a simulated annealing process similar to Ref. [2], but it nontrivially extends the slicing floorplan area optimization algorithms [3,5] to find the optimal realization for each slicing tree. The main idea behind this extension is explained below. Assume that a set of modules is given, where each module has an associated resource requirement vector ϕ = (m1 , m2 , m3 ), indicating that this module requires m1 CLBs, m2 RAMs, and m3 multipliers. The FPGA floorplanning problem is to place modules on the chip such that each region assigned to a module satisfies the resource requirements of the module, regions for different modules do not overlap, and a given cost function is optimized. For example, if there are six modules, and their resource requirement vectors are ϕ1 = (12, 2, 1), ϕ2 = (30, 4, 4), ϕ3 = (15, 1, 1), ϕ4 = (24, 4, 4), ϕ5 = (18, 2, 2), and ϕ6 = (30, 2, 2), then Figure 9.14a is a feasible slicing floorplan for these modules (see Figure 9.14b for the corresponding slicing tree). For easier illustration, a coordinate system is adopted on the chip. In Figure 9.14a the horizontal unit is the width of a CLB, and the vertical unit is the height of a CLB. The lower left CLB has coordinate (0, 0), the lower left RAM occupies coordinates (1, 0) through (1, 2), and the lower left multiplier occupies coordinates (2, 0) through (2, 2). Let H and W be the height and the width of the chip, respectively. Any rectangular region r in the chip is denoted by a four-tuple (x, y, w, h), where (x, y), w, and h are the lower left coordinate, the width, and the height of r, respectively. The x(r), y(r), w(r), and h(r) each denote a corresponding field of r. Let Ri denote the set of rectangular regions in the chip that satisfy the resource requirements of module i. Each region in Ri is said to be a realization of module i because it is feasible to place module i in that region. A realization r1 in Ri is redundant if there is another realization r2 in Ri such that both realizations have the same lower left coordinate (i.e., r1 (x) = r2 (x), r1 (y) = r2 (y)) and r2 is not larger than r1 in both dimensions (i.e., r2 (w) ≤ r1 (w) and r2 (h) ≤ r1 (h)). Clearly all redundant realizations can be discarded. Let L(i, x, y) denote the irreducible realization list (IRL) for module i starting at coordinate (x, y) and it contains all the irredundant realizations in Ri with (x, y) being their lower left coordinate. Therefore, all irredundant realizations of a module are organized into different IRLs for different starting coordinates. Each IRL is sorted in the decreasing height order, and hence it is also sorted in the increasing width order. The definition of an IRL can be extended to the nodes in a slicing tree. Given two rectangles r1 and r2 , the bounding rectangle of r1 and r2 is a rectangle r with r(x)= min{r1 (x), r2 (x)}, r(y)= min{r1 (y), r2 (y)}, r(w)= max{r1 (w) + r1 (x), r2 (w) + r2 (x)} − r(x), and Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C009 Slicing Floorplans Finals Page 181 24-9-2008 #22 181 r(h)= max{r1 (h) + r1 (y), r2 (h) + r2 (y)} − r(y). Given a tree node u, if u represents module i, Ru = Ri . On the other hand, if u is an internal node, let u1 , u2 be the left and the right children of u. If u is a vertical cut (a horizontal cut, respectively), Ru consists of all bounding rectangles of r1 in Ru1 and r2 in Ru2 , where r1 is to the left side of (below, respectivly) r2 . The IRL for a tree node u starting at coordinate (x, y) is defined as L(u, x, y) = {r|r is irredundant in Ru , x(r) = x, and y(r) = y}. Therefore, the set of all irredundant realizations of a tree node is also organized into different IRLs for different starting coordinates. Given a slicing tree, the IRLs of each node are calculated from leaves to the root. Obviously, the IRLs of each module only need to be calculated once, right at the beginning of the simulated annealing process. Suppose u is the internal node under consideration, and is a vertical cut. Let u1 , u2 be the left and the right children of u. Assume L(u1 , x, y) = {r1 , r2 , . . . , rk } is sorted as expected. It can be shown that it is enough to combine every realization ri in L(u1 , x, y) with realizations in L(u2 , x + w(ri ), y) to generate L(u, x, y). Moreover, when combining ri with realizations in L(u2 , x + w(ri ), y), we may not need to consider all combinations. For those realizations in L(u2 , x + w(ri ), y) with heights not larger than h(ri ), we only need to consider the highest one to get a minimum width. We also do not need to combine ri with a realization r in L(u2 , x + w(ri ), y) if h(r ) ≥ h(ri−1 ). The procedure to construct L(u, x, y) can be derived similarly when u is a horizontal cut. It takes O(l log l) time to construct L(u, x, y), where l = max{H, W }. The above-mentioned method, however, should not be implemented directly on the chip, because finding IRLs for every coordinate makes the space complexity formidable. Fortunately, a real FPGA chip is very regular with repetitions of a basic pattern. Consider the example chip in Figure 9.14a whose basic pattern is shown in Figure 9.14c. It turns out that this repetition property can be utilized such that only computation on the pattern instead of the whole chip needs to be done. As a result, evaluating a slicing tree takes O(nml log l) time and needs O(nlm) memory space, where n is the number of modules and m is the number of points on the pattern. 9.7.4 3D FLOORPLANNING Complementary metal oxide semiconductor (CMOS) technology has continuously scaled into nanometer regime, and it has become more difficult to improve the chip performance by just size shrinking in a planar wafer. 3D ICs is a promising technology to keep the speed of an IC advancing. 3D ICs provide significant performance benefits over two-dimensional integrated circuits (2D ICs) mainly by reducing the interconnect lengths and introducing new geometrical arrangement of modules [32]. To improve the performance, circuit modules will not be confined in only one layer in 3D ICs. This produces a problem for current 2D floorplanning tools. As a result, 3D floorplanning algorithms are required for 3D IC design. A 3D slicing floorplan design problem is addressed in Ref. [28]. This problem is formulated as that of placing a given set of 3D rectangular modules without overlapping while a given cost function is optimized. Assuming each 3D module is a hard module but with free rotation, an algorithm for solving the 3D slicing floorplan design problem is reported in Ref. [28], which generalizes slicing trees to represent different 3D floorplans, and uses simulated annealing to search for a good slicing floorplan. The main idea of the algorithm is highlighted below. A 3D slicing floorplan can be obtained by cutting a 3D block by 2D planes (which are perpendicular to the x-, y-, or z-axis) into a set of 3D subblocks such that each 3D subblock is large enough to accommodate the 3D module assigned to it (Figure 9.15a). Slicing trees can be generalized to represent 3D slicing floorplans such that each internal node of a slicing tree is now labeled by X, Y , or Z. The label X (Y , Z, respectively) means that the corresponding subfloorplan is cut by a plane that is perpendicular to the x-axis (y-axis, z-axis, respectively). Figure 9.15b gives a slicing tree to represent the 3D floorplan shown in Figure 9.15a. Because a slicing tree is a full binary tree (due to the fact that each internal node of the tree has two children), a static array with 2n − 1 elements can be used to represent all the nodes of the tree,

Handbook of algorithms for physical design automation part 20

Nội dung