Handbook of algorithms for physical design automation part 57

pdf
Số trang Handbook of algorithms for physical design automation part 57 10 Cỡ tệp Handbook of algorithms for physical design automation part 57 196 KB Lượt tải Handbook of algorithms for physical design automation part 57 0 Lượt đọc Handbook of algorithms for physical design automation part 57 0
Đánh giá Handbook of algorithms for physical design automation part 57
4.6 ( 18 lượt)
Nhấn vào bên dưới để tải tài liệu
Để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 542 Finals Page 542 29-9-2008 #9 Handbook of Algorithms for Physical Design Automation After pruning inferior solutions, the solution set at v1 is {(167, 19) , (207, 39) , (278, 59) , (308, 74)} 26.4 VAN GINNEKEN EXTENSIONS 26.4.1 HANDLING LIBRARY WITH MULTIPLE BUFFERS We extend the standard van Ginneken’s algorithm to handle multiple buffers and buffer cost [12]. The buffer library B now contains various types of buffers. Each buffer b in the buffer library has a cost W (b), which can be measured by area or any other metric, depending on the optimization objective. A function f : Vn → 2B specifies the types of buffers allowed at each internal vertex in T . The cost of a solution γ , denoted by W (γ ), is defined as W (γ ) = b∈γ Wb . With the above notations, our new problem can be formulated as follows. Minimum-cost timing-constrained buffer insertion problem: Given a binary routing tree T = (V , E), possible buffer positions defined using f , and a buffer library B, compute a minimal-cost buffer assignment γ such that the RAT at driver is smaller than a timing constraint α. In contrast to the single buffer type case, W is introduced into the (Q, C) pair to handle buffer cost, i.e., each solution is now associated with a (Q, C, W ) triple. As such, during the process of bottom-up computation, additional efforts need to be made in updating W : if γ  is generated by inserting a wire into γ , then W (γ  ) = W (γ ); if γ  is generated by inserting a buffer b into γ , then W (γ  ) = W (γ ) + W (b); if γ  is generated by merging γl with γr , then W (γ  ) = W (γl ) + W (γr ). The definition of inferior solutions needs to be revised as well. For any two solutions γ1 and γ2 at the same node, γ1 dominates γ2 if C(γ1 ) ≤ C(γ2 ), W (γ1 ) ≤ W (γ2 ), and Q(γ1 ) ≥ Q(γ2 ). Whenever a solution becomes dominated, it is pruned from the solution set. Therefore, only solutions that excel in at least one aspect of downstream capacitance, buffer cost, and RAT can survive. With the above modification, van Ginneken’s algorithm can easily adapt to the new problem setup. However, because the domination is defined on a (Q, C, W ) triple rather than a (Q, C) pair, more efficient pruning technique is necessary to maintain the efficiency of the algorithm. As such, range search tree technique is incorporated [12]. It can be simply implemented as follows. A list of binary search trees are maintained where a tree corresponds to a W. Each binary search tree is keyed by C and each node in the tree also stores the largest Q at the node or in its left subtree [12]. 26.4.2 LIBRARY WITH INVERTERS So far, all buffers in the buffer library are noninverting buffers. There can also have inverting buffers, or simply inverters. In terms of buffer cost and delay, inverter would provide cheaper buffer assignment and better delay over noninverting buffers. As regard to algorithmic design, it is worth noting that introducing inverters into the buffer library brings the polarity issue to the problem, as the output polarity of a buffer will be negated after inserting an inverter. 26.4.3 POLARITY CONSTRAINTS When output polarity for driver is required to be positive or negative, we impose a polarity constraint to the buffering problem. To handle polarity constraints, during the bottom-up computation, the algorithm maintains two solution sets, one for positive and one for negative buffer input polarity. After choosing the best solution at driver, the buffer assignment can be then determined by a top-down traversal. The details of the new algorithm are elaborated as follows. Denote the two solution sets at vertex v by v+ and v− corresponding to positive polarity and negative polarity, respectively. Supposed that an inverter b− is inserted to a solution γv+ ∈ v+ , a new solution γv is generated in the same way as before except that it will be placed into v− . Similarly, the Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 543 29-9-2008 #10 543 Buffer Insertion Basics new solution generated by inserting b− to a solution γv− ∈ v− will be placed into v+ . For inserting a noninverting buffer, the new solution is placed in the same set as its origin. The other operations are easier to handle. The wire insertion goes the same as before and two solution sets are handled separately. Merging is carried out only among the solutions with the same polarity, e.g., the positive-polarity solution set of left branch is merged with that of the right branch. For inferiority check and solution pruning, only the solutions in the same set can be compared. 26.4.4 SLEW AND CAPACITANCE CONSTRAINTS The slew rate of a signal refers to the rising or falling time of a signal switching. Sometimes, the slew rate is referred as signal transition time. The slew rate of almost every signal has to be sufficiently small because a large slew rate implies large delay, large short-circuit power dissipation, and large vunlerability to crosstalk noise. In practice, a maximal slew rate constraint is required at the input of each gate/buffer. Therefore, this constraint needs to be obeyed in a buffering algorithm [12–15]. A simple slew model is essentially equivalent to the Elmore model for delay. It can be explained using a generic example, which is a path p from node vi (upstream) to vj (downstream) in a buffered tree. There is a buffer (or the driver) bu at vi , and there is no buffer between vi and vj . The slew rate S(vj ) at vj depends on both the output slew Sbu ,out (vi ) at buffer bu and the slew degradation Sw (p) along path p (or wire slew), and is given by [16] S(vj ) = Sbu ,out (vi )2 + Sw (p)2 (26.11) The slew degradation Sw (p) can be computed with Bakoglu’s metric [17] as Sw (p) = ln 9 · D(p) (26.12) where D(p) is the Elmore delay from vi to vj . The output slew of a buffer, such as bu at vi , depends on the input slew at this buffer and the load capacitance seen from the output of the buffer. Usually, the dependence is described as a two-dimensional lookup table. As a simplified alternative, one can assume a fixed input slew at each gate/buffer. This fixed slew is equal to the maximum slew constraint and therefore is always satisfied, but is a conservative estimation. For fixed input slew, the output slew of buffer b at vertex v is then given by Sb,out (v) = Rb · C(v) + Kb (26.13) where C(v) is the downstream capacitance at v Rb and Kb are empirical fitting parameters This is similar to empirically derived K-factor equations [18]. We call Rb the slew resistance and Kb the intrinsic slew of buffer b. In a van Ginneken style buffering algorithm, if a candidate solution has a slew rate greater than given slew constraint, it is pruned out and will not be propagated any more. Similar as the slew constraint, circuit designs also limit the maximum capacitive load a gate/buffer can drive [15]. For timing noncritical nets, buffer insertion is still necessary for the sake of satisfying the slew and capacitance constraints. For this case, fast slew buffering techniques are introduced in Ref. [19]. 26.4.5 INTEGRATION WITH WIRE SIZING In addition to buffer insertion, wire sizing is an effective technique for improving interconnect performance [20–24]. If wire size can take only discrete options, which is often the case in practice, wire sizing can be directly integrated with van Ginneken style buffer insertion algorithm [12]. In Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 544 Finals Page 544 29-9-2008 #11 Handbook of Algorithms for Physical Design Automation Tapered wire sizing Uniform wire sizing FIGURE 26.6 Wire sizing with tapering and uniform wire sizing. the bottom-up dynamic programming procedure, multiple wire width options need to be considered when a wire is added (see Section 26.3.2.1). If there are k options of wire size, then k new candidate solutions are generated, one corresponding each wire size. However, including the wire sizing in van Ginneken’s algorithm makes the complexity pseudopolynomial [12]. In Ref. [25], layer assignment and wire spacing are considered in conjunction with wire sizing. A combination of layer, width, and spacing is called a wire code. All wires in a net have to use an identical wire code. If each wire code is treated as a polarity, the wire code assignment can be integrated with buffer insertion in the same way as handling polarity constraint (see Section 26.4.3). In contrast to simultaneous wire sizing and buffer insertion [12], the algorithm complexity stays polynomial after integrating wire-code assignment [25] with van Ginneken’s algorithm. Another important conclusion in Ref. [25] is about wire tapering. Wire tapering means that a wire segment is divided into multiple pieces and each piece can be sized individually. In contrast, uniform wire sizing does not make such division and maintain the same wire width for the entire segment. These two cases are illustrated in Figure 26.6. It is shown in Ref. [25] that the benefit of wire tapering versus uniform wire sizing is very limited when combined with buffer insertion. It is theoretically proved [25] that the signal velocity from simultaneous buffering with wire tapering is at most 1.0354 times of that from buffering and uniform wire sizing. In short, wire tapering improves signal speed by at most 3.54 percent over uniform wire sizing. 26.4.6 NOISE CONSTRAINTS WITH DEVGAN METRIC The shrinking of minimum distance between adjacent wires has caused an increase in the coupling capacitance of a net to its neighbors. A large coupling capacitance can cause a switching net to induce significant noise onto a neighboring net, resulting in an incorrect functional response. Therefore, noise avoidance techniques must become an integral part of the performance optimization environment. The amount of coupling capacitance from one net to another is proportional to the distance that the two nets run parallel to each other. The coupling capacitance may cause an input signal on the aggressor net to induce a noise pulse on the victim net. If the resulting noise is greater than the tolerable noise margin (NM) of the sink, then an electrical fault results. Inserting buffers in the victim net can separate the capacitive coupling into several independent and smaller portions, resulting in smaller noise pulse on the sink and on the input of the inserted buffers. Before describing the noise-aware buffering algorithms, we first introduce the coupling noise metric as follows. 26.4.6.1 Devgan’s Coupling Noise Metric Among many coupling noise models, Devgan’s metric [26] is particularly amenable for noise avoidance in buffer insertion, because its computational complexity, structure, and incremental nature is the same as the famous Elmore delay metric. Further, like the Elmore delay model, the noise metric Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 545 29-9-2008 #12 545 Buffer Insertion Basics (a) (b) FIGURE 26.7 Illustration of noise model. (a) A net is a victim of the crosstalk noise induced by its neighboring nets. (b) The crosstalk noise can be modeled as current sources. (Modified at http://dropzone.tamu.edu/ ∼jhu/noise.eps) is a provable upper bound on coupled noise. Other advantages of the noise metric include the ability to incorporate multiple aggressor nets and handle general victim and aggressor net topologies. A disadvantage of the Devgan metric is that it becomes more pessimistic as the ratio of the aggressor net’s transition time (at the output of the driver) to its delay decreases. However, cases in which this ratio becomes very small are rare because a long net delay generally corresponds to a large load on the driver, which in turn causes a slower transition time. The metric does not consider the duration of the noise pulse either. In general, the NM of a gate is dependent on both the peak noise amplitude and the noise pulse width. However, when considering failure at a gate, peak amplitude dominates pulse width. If a wire segment e in the victim net is adjacent with t aggressor nets, let λ1 , . . . , λt be the ratios of coupling to wire capacitance from each aggressor net to e, and let µ1 , . . . , µt be the slopes of the aggressor signals. The impact of a coupling from aggressor j can be treated as a current source Ie,j = Ce · λj · µj where Ce is the wire capacitance of wire segment e. This is illustrated in Figure 26.7. The total current induced by the aggressors on e is Ie = Ce t   λj · µj  (26.14) j=1 Often, information about neighboring aggressor nets is unavailable, especially if buffer insertion is performed before routing. In this case, a designer may wish to perform buffer insertion to improve performance while also avoiding future potential noise problems. When performing buffer insertion in estimation mode, one might assume that (1) there is a single aggressor net that couples with each wire in the routing tree, (2) the slope of all aggressors is µ, and (3) some fixed ratio λ of the total capacitance of each wire is due to coupling capacitance. Let IT (v) be defined as the total downstream current see at node v, i.e., IT (v) =  Ie e∈ET (v) where ET (v) is the set of wire edges downstream of node v. Each wire adds to the noise induced on the victim net. The amount of additional noise induced from a wire e = (u, v) is given by   Ie Noise(e) = Re + IT (v) (26.15) 2 where Re is the wire resistance. The total noise seen at sink si starting at some upstream node v is Noise(v − si) = Rv IT (v) +  e∈path(v−si) Noise(e) (26.16) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 546 Finals Page 546 29-9-2008 #13 Handbook of Algorithms for Physical Design Automation where gate driving resistance Rv = 0 if there is no gate at node v. The path from v to si has no intermediate buffers. Each node v has a predetermined noise margin NM(v). If the circuit is to have no electrical faults, the total noise propagated from each driver/buffer to each its sink si must be less than the NM for si. We define the noise slack for every node v as NS(v) = min NM(si) − Noise(v − si) si∈SIT (v) (26.17) where SIT (v) is the set of sink nodes for the subtree rooted at node v. Observe that NS(si) = NM(si) for each sink si. 26.4.6.2 Algorithm of Buffer Insertion with Noise Avoidance We begin with the simplest case of a single wire with uniform width and neighboring coupling capacitance. Let us consider a wire e = (u, v). First, we need to ensure NS(v) ≥ Rb IT (v) where Rb is the buffer output resistance. If this condition is not satisfied, inserting a buffer even at node v cannot satisfy the constraint of NM, i.e., buffer insertion is needed within subtree T (v). If NS(v) ≥ Rb IT (v), we next search for the maximum wirelength le,max of e such that inserting a buffer at u always satisfies noise constraints. The value of le,max tells us the maximum unbuffered length or the minimum buffer usage for satisfying noise constraints. Let R = Re /le be the wire resistance per unit length and I = Ie /le be the current per unit length. According to Ref. [27], this value can be determined by le,max IT (v) Rb + =− − R I  Rb R 2  + IT (v) I 2 + 2NS(v) I ·R (26.18) Depending on the timing criticality of the net, the noise-aware buffer insertion problem can be formulated in two different ways: (1) minimize total buffer cost subject to noise constraints and (2) maximize timing slack subject to noise constraints. The algorithm for the former is a bottom-up dynamic programming procedure, which inserts buffers greedily as far apart as possible [27]. Each partial solution at node v is characterized by a three-tuple of downstream noise current IT (v) , noise slack NS(v), and buffer assignment M. In the solution propagation, the noise current is accumulated in the same way as the downstream capacitance in van Ginneken’s algorithm. Likewise, noise slack is treated like the timing slack (or RAT). This algorithm can return an optimal solution for a multisink tree T = (V , E) in O(|V |2 ) time. The core algorithm of noise-constrained timing slack maximization is similar as van Ginneken’s algorithm except that the noise constraint is considered. Each candidate solution at node v is represented by a five-tuple of downstream capacitance Cv , RAT q(v), downstream noise current IT (v), noise slack NS(v), and buffer assignment M. In addition to pruning inferior solutions according to the (C, q) pair, the algorithm eliminates candidate solutions that violate the noise constraint. At the source, the buffering solution not only has optimized timing performance but also satisfies the noise constraint. 26.4.7 HIGHER ORDER DELAY MODELING Many buffer insertion methods [11,12,28] are based on the Elmore wire delay model [29] and a linear gate delay model for the sake of simplicity. However, the Elmore delay model often overestimates interconnect delay. It is observed in Ref. [30] that Elmore delay sometimes has over 100 percent overestimation error when compared to SPICE. A critical reason of the overestimation is due to the neglection of the resistive shielding effect. In the example of Figure 26.8, the Elmore delay from node A to B is equal to R1 (C1 + C2 ) assuming that R1 can see the entire capacitance of C2 despite the fact that C2 is somewhat shielded by R2 . Consider an extreme scenario where R2 = ∞ or there Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 547 29-9-2008 #14 547 Buffer Insertion Basics R1 A R2 B C1 C C2 FIGURE 26.8 Example of resistive shielding effect. is open circuit between node B and C. Obviously, the delay from A to B should be R1 C1 instead of the Elmore delay R1 (C1 + C2 ). The linear gate delay model is inaccurate owing to its neglection of nonlinear behavior of gate delay in addition to resistive shielding effect. In other words, a gate delay is not a strictly linear function of load capacitance. The simple and relatively inaccurate delay models are suitable only for early design stages such as buffer planning. In postplacement stages, more accurate models are needed because (1) optimal buffering solutions based on simple models may be inferior, because actual delay is not being optimized and (2) simplified delay modeling can cause a poor evaluation of the trade-off between total buffer cost and timing improvement. In more accurate delay models, the resistive shielding effect is considered by replacing lumped load capacitance with higher order load admittance estimation. The accuracy of wire delay can be improved by including higher order moments of transfer function. An accurate and popular gate delay model is usually a lookup table employed together with effective capacitance [31,32], which is obtained based on the higher order load admittance. These techniques will be described in more details as follows. 26.4.7.1 Higher Order Point Admittance Model For an RC tree, which is a typical circuit topology in buffer insertion, the frequency-domain point admittance at a node v is denoted as Yv (s). It can be approximated by the third-order Taylor expansion Yv (s) = yv,0 + yv,1 s + yv,2 s2 + yv,3 s3 + O(s4 ) where yv,0 , yv,1 , yv,2 , and yv,3 are expansion coefficients. The third-order approximation usually provides satisfactory accuracy in practice. Its computation is a bottom-up procedure starting from the leaf nodes of an RC tree, or the ground capacitors. For a capacitance C connected to ground, the admittance at its upstream end is simply Cs. Please note that the zeroth order coefficient is equal to 0 in an RC tree because there is no DC path connected to ground. Therefore, we only need to propagate y1 , y2 , and y3 in the bottom-up computation. There are two cases we need to consider: • Case 1: For a resistance R, given the admittance Yd (s) of its downstream node, compute the admittance Yu (s) of its upstream node (Figure 26.9a). Yd1(s) Yu (s) Yu(s) R Yd(s) (a) FIGURE 26.9 Two scenarios of admittance propagation. Yd2(s) (b) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 548 Finals Page 548 29-9-2008 #15 Handbook of Algorithms for Physical Design Automation Rπ Y(s) Cd Cu FIGURE 26.10 Illustration of π-model. yu,1 = yd,1 2 yu,2 = yd,2 − Ryd,1 3 yu,3 = yd,3 − 2Ryd,1 yd,2 + R2 yd,1 (26.19) • Case 2: Given admittance Yd1 (s) and Yd2 (s) corresponding to two branches, compute the admittance Yu (s) after merging them (Figure 26.9b). yu,1 = yd1,1 + yd2,1 yu,2 = yd1,2 + yd2,2 yu,3 = yd1,3 + yd2,3 (26.20) The third-order approximation (y1 , y2 , y3 ) of an admittance can be realized as an RC π-model (Cu , Rπ , Cd ) (Figure 26.10) where Cu = y1 − 26.4.7.2 y22 y3 Rπ = − y32 y23 Cd = y22 y3 (26.21) Higher Order Wire Delay Model While the Elmore delay is equal to the first-order moment of transfer function, the accuracy of delay estimation can be remarkably improved by including higher order moments. For example, the wire delay model [33] based on the first three moments and the closed-form model [34] using the first two moments. Because van Ginneken style buffering algorithms proceed in a bottom-up manner, bottom-up moment computations are required. Figure 26.11a shows a wire e connected to a subtree rooted at (1) (2) (k) node B. Assume that the first k moments mBC , mBC , . . . , mBC have already been computed for the (1) (2) (k) path from B to C. We wish to compute the moments mAC , mAC , . . . , mAC so that the A  C delay can be derived. The techniques in Section 26.4.7.1 are used to reduce the subtree at B to a π-model (Cj , Rπ , Cf ) (Figure 26.11b). Node D just denotes the point on the far side of the resistor connected to B and is not an actual physical location. The RC tree can be further simplified to the network shown in Figure 26.11c. The capacitance Cj and Ce /2 at B are merged to form a capacitor with value Cn . The moments from A to B can be recursively computed by the equation (i) (i−1) (i−1) mAB = −Re mAB + mAD Cf A (a) Wire e Re , Ce B C A Ce /2 Re Ce / 2 Rπ B Cj (b) FIGURE 26.11 Illustration of bottom-up moment computation. (26.22) D A Cf Re Ce /2 (c) B Rπ Cn Cf D Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 549 29-9-2008 #16 549 Buffer Insertion Basics where the moments from A to D are given by (i) (i) (i−1) = mAB − mAD Rπ Cf mAD (26.23) (0) (0) = mAD = 1. Now the moments from A to C can be computed via moment multiplication as and mAB follows: i  (j) (i−j) (i) mAC mAB (26.24) = · mBC j=0 One property of Elmore delay that makes it attractive for timing optimization is that the delays are additive. This property does not hold for higher order delay models. Consequently, a noncritical sink in a subtree may become a critical sink depending on the value of upstream resistance [35]. Therefore, one must store the moments for all the paths to downstream sinks during the bottom-up candidate solution propagation. 26.4.7.3 Accurate Gate Delay A popular gate-delay model with decent accuracy consists of the following three steps: 1. Compute a π-model of the driving point admittance for the RC interconnect using the techniques introduced in Section 26.4.7.1. 2. Given the π-model and the characteristics of the driver, compute an effective capacitance Ceff [31,32]. 3. Based on Ceff , compute the gate delay using k-factor equations or lookup table [36]. 26.4.8 FLIP-FLOP INSERTION The technology scaling leads to decreasing clock period, increasing wire delay, and growing chip size. Consequently, it often takes multiple clock cycles for signals to reach their destinations along global wires. Traditional interconnect optimization techniques such as buffer insertion are inadequate in handling this scenario and flip-flop/latch insertion (or interconnect pipelining) becomes a necessity. In pipelined interconnect design, flip-flops and buffers are inserted simultaneously in a given Steiner tree T = (V , E) [37,38]. The simultaneous insertion algorithm is similar to van Ginneken’s dynamic programming method except that a new criterion, latency, needs to be considered. The latency from the signal source to a sink is the number of flip-flops in-between. Therefore, a candidate solution at node v ∈ V is characterized by a 4-tuple (cv , qv , λv , av ), where cv is the downstream capacitance, qv is the required arrival time, λv is the latency and av is the buffer assignment at v. Obviously, a small latency is preferred. The inclusion of flip-flop and latency also requests other changes in a van Ginneken style algorithm. When a flip-flop is inserted in the bottom-up candidate propagation, the RAT at the input of this flip-flop is reset to clock period time Tφ . The latency of corresponding candidate solution is also increased by 1. For the ease of presentation, clock skew and setup/hold time are neglected without loss of generality. Then, the delay between two adjacent flip-flops cannot be greater than the clock period time Tφ , i.e., the RAT cannot be negative. During the candidate solution propagation, if a candidate solution has negative RAT, it should be pruned without further propagation. When two candidate solutions from two child branches are merged, the latency of the merged solution is the maximum of the two branch solutions. There are two formulations for the simultaneous flip-flop and buffer insertion problem: MiLa, which finds the minimum latency that can be obtained, and GiLa, which finds a flip-flop/buffer insertion implementation that satisfies given latency constraint. MiLa can be used for the estimation of interconnect latency at the microarchitectural level. After the microarchitecture design is completed, all interconnect must be designed so as to abide to given latency requirements by using GiLa. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 550 Finals Page 550 29-9-2008 #17 Handbook of Algorithms for Physical Design Automation Algorithm: MiLa(T u )/MiLa(T u,v ) Input: Subtree rooted at node u or edge (u,v ) Output: A set of candidate solutions u Global: Routing tree T and buffer library B 1. if u is a leaf, u = (C u ,q u ,0,0) // q is required arrival time 2. else if u has one child node v or the input is T u,v 2.1 v = MiLa(v ) 2.2 u = ∪γ ∈v (addWire((u,v ),γ )) 2.3 b = ∅ 2.4 for each b in B 2.4.1  = ∪γ ∈u (addBuffer (γ ,b )) 2.4.2 prune  2.4.3 b = b ∪  2.5 u = u ∪ b 3. else if u has two child edges (u,v ) and (u,z) 3.1 u,v = MiLa(T u,v ), u,z = MiLa(T u,z ) 3.2 u = u ∪ merge(u,v ,u,z ) 4. prune u 5. return u FIGURE 26.12 MiLa algorithm. The algorithm of MiLa [38] and GiLa [38] are shown in Figures 26.12 and 26.13, respectively. In GiLa, the λu for a leaf node u is the latency constraint at that node. Usually, λu at a leaf is a nonpositive number. For example, λu = −3 requires that the latency from the source to node u is 3. During the bottom-up solution propagation, λ is increased by 1 if a flip-flop is inserted. Therefore, λ = 0 at the source implies that the latency constraint is satisfied. If the latency at the source is greater than zero, then the corresponding solution is not feasible (line 2.6.1 of Figure 26.13). If the latency at the source is less than zero, the latency constraint can be satisfied by padding extra flip-flops in the corresponding solution (line 2.6.2.1 of Figure 26.13). The padding procedure is called ReFlop(Tu , k), which inserts k flip-flops in the root path of Tu . The root path is from u to either a leaf node or a branch node v and there is no other branch node in-between. The flip-flops previously inserted on the root path and the newly inserted k flip-flops are redistributed evenly along the path. When solutions from two branches in GiLa are merged, ReFlop is performed (line 3.3–3.4.1 of Figure 26.13) for the solutions with smaller latency to ensure that there is at least one merged solution matching the latency of both branches. 26.5 SPEEDUP TECHNIQUES Because of dramatically increasing number of buffers inserted in the circuits, algorithms that can efficiently insert buffers are essential for the design automation tools. In this chapter, several recent proposed speedup results are introduced and the key techniques are described. 26.5.1 RECENT SPEEDUP RESULTS This chapter studies buffer insertion in interconnect with a set of possible buffer positions and a discrete buffer library. In 1990, van Ginneken [11] proposed an O(n2 ) time dynamic programming algorithm for buffer insertion with one buffer type, where n is the number of possible buffer positions. His algorithm finds a buffer insertion solution that maximizes the slack at the source. In 1996, Lillis et al. [12] extended van Ginneken’s algorithm to allow b buffer types in time O(b2 n2 ). Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 551 29-9-2008 #18 Buffer Insertion Basics 551 Algorithm: GiLa(T u )/GiLa(T u,v ) Input: Subtree T u rooted at node u or edge (u,v ) Output: A set of candidate solutions u Global: Routing tree T and buffer library B 1. if u is a leaf, u = (C u, q u, λu, 0) 2. else if node u has one child node v or the input is T u,v 2.1 v = GiLa(T v ) 2.2 u = ∪γ ∈v (addWire((u,v ),γ )) 2.3 b = ∅ 2.4 for each b in B 2.4.1  = ∪γ ∈u (addBuffer (γ ,b )) 2.4.2 prune  2.4.3 b = b ∪  2.5 u = u ∪ b // u ≡ { x ,. . ., y },x,y indicate latency 2.6 if u is source 2.6.1 if x > 0, exit: the net is not feasible 2.6.2 if y < 0, // insert −y more flops in u 2.6.2.1 u = ReFlop(T u , − y ) 3. else if u has two child edges (u,v ) and (u,z) 3.1 u,v = GiLa(T u,v ),u,z = GiLa(T u,z ) 3.2 //u,v ≡ { x ,. . ., y }, u,z ≡ { m ,. . ., n } 3.3 if y < m // insert m − y more flops in u,v 3.3.1 u,v = ReFlop(T u,v ,m − y ) 3.4 if n < x // insert x − n more flops in u,z 3.4.1 u,z = ReFlop(T u,z ,x − n) 3.5 u = u ∪ merge(u,v ,u,z ) 4. prune u 5. return u FIGURE 26.13 GiLa algorithm. Recently, many efforts are taken to speedup the van Ginneken’s algorithm and its extensions. Shi and Li [39] improved the time complexity of van Ginneken’s algorithm to O(b2 n log n) for two-pin nets, and O(b2 n log2 n) for multipin nets. The speedup is achieved by four novel techniques: predictive pruning, candidate tree, fast redundancy check, and fast merging. To reduce the quadratic effect of b, Li and Shi [40] proposed an algorithm with time complexity O(bn2 ). The speedup is achieved by the observation that the best candidate to be associated with any buffer must lie on the convex hull of the (Q,C) plane and convex pruning. To utilize the fact that in real applications most nets have small number of pins and large number of buffer positions, Li and Shi [41] proposed a simple O(mn) algorithms for m-pin nets. The speedup is achieved by the property explored in Ref. [40], convex pruning, a clever bookkeeping method, and an innovative linked list that allow O(1) time update for adding a wire or a candidate. In the following subsections, new pruning techniques, an efficient way to find the best candidates when adding a buffer, and implicit data representations are presented. They are the basic components of many recent speedup algorithms. 26.5.2 PREDICTIVE PRUNING During the van Ginneken’s algorithm, a candidate is pruned out only if there is another candidate that is superior in terms of capacitance and slack. This pruning is based on the information at the current
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.