Handbook of algorithms for physical design automation part 56.pdf (Sổ tay thuật toán)

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C025 532 Finals Page 532 29-9-2008 #25 Handbook of Algorithms for Physical Design Automation 36. Elmore, W. C. The transient response of damped linear networks with particular regard to wide-band amplifiers. Journal of Applied Physics 19(1): 55–63, 1948. 37. Lin, T. M. and Mead, C. A. Signal delay in general RC-networks. IEEE Transactions Computer-Aided Design CAD-3(4): 331–349, October 1984. 38. Rubinstein, J., Penfield, P., and Horowitz, M. A. Signal delay in RC tree networks. IEEE Transactions Computer-Aided Design 2(3): 202–211, 1983. 39. Tsay, R. S. Exact zero skew. In Proceedings of the IEEE International Conference Computer-Aided Design, Santa Clara, CA, November 1991, pp. 336–339. 40. Alpert, C. J., Hu, T. C., Huang, J. H., Kahng, A. B., and Karger, D. Prim-Dijkstra tradeoffs for improved performance-driven routing tree design. IEEE Transactions Computer-Aided Design 14(7): 890–896, July 1995. (ISCAS 1993). 41. Awerbuch, B., Baratz, A., and Peleg, D. Cost-sensitive analysis of communication protocols. In Proceedings of the ACM Symposium Principles of Distributed Computing, Quebec City, Quebec, Canada, 1990, pp. 177–187. 42. Cong, J., Kahng, A. B., Robins, G., Sarrafzadeh, M., and Wong, C. K. Provably good algorithms for performance-driven global routing. In Proceedings of the IEEE International Symposium Circuits and Systems, San Diego, CA, May 1992, pp. 2240–2243. 43. Cong, J., Kahng, A. B., Robins, G., Sarrafzadeh, M., and Wong, C. K. Provably good performance-driven global routing. IEEE Transactions Computer-Aided Design 11(6): 739–752, 1992. 44. Khuller, S., Raghavachari, B., and Young, N. Balancing minimum spanning and shortest path trees. In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, Austin, TX, January 1993, pp. 243–250. 45. Boese, K. D., Kahng, A. B., McCoy, B. A., and Robins, G. Fidelity and near-optimality of Elmore-based routing constructions. In Proceedings of the IEEE International Conference Computer Design, Cambridge, MA, October 1993, pp. 81–84. 46. Boese, K. D., Kahng, A. B., McCoy, B. A., and Robins, G. Rectilinear Steiner trees with minimum Elmore delay. In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June 1994, pp. 381–386. 47. Boese, K. D., Kahng, A. B., and Robins, G. High-performance routing trees with identified critical sinks. In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp. 182–187. 48. Lillis, J., Cheng, C. K., Lin, T. -T. Y., and Ho, C. -Y. New performance driven routing techniques with explicit area/delay tradeoff and simultaneous wire sizing. In Proceedings of the ACM/IEEE Design Automation Conference, Las Vegas, NV, 1996, pp. 395–400. 49. Chen, H., Cheng, C. -K., Kahng, A., Măndoiu, I. I., Wang, Q., and Yao., B. The y-architecture for on-chip interconnect: Analysis and methodology. IEEE Transactions Computer-Aided Design 24(4): 588–599, April 2005. 50. Chen, H., Cheng, C. -K., Kahng, A. B., Măndoiu, I., and Wang, Q. Estimation of wirelength reduction for λ-geometry vs. Manhattan placement and routing. In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, Monterey, CA, 2003, pp. 71–76. 51. Koh, C. -K. and Madden, P. H. Manhattan or non-Manhattan?: A study of alternative VLSI routing architectures. In Proceedings of the Great Lakes Symposium VLSI, Chicago, IL, 2000, pp. 47–52. 52. Li, Y. Y., Cheung, S. K., Leung, K. S., and Wong, C. K. Steiner tree construction in λ3 -metric. IEEE Transactions Circuits and Systems-II: Analog and Digital Signal Processing 45(5): 563–574, May 1998. 53. Nielsen, B. K., Winter, P., and Zachariasen, M. An exact algorithm for the uniformly-oriented Steiner tree problem. In Proceedings of the European Symposium on Algorithms, Lecture Notes in Computer Science 2461. Springer-Verlag, Rome, Italy, 2002, pp. 760–771. 54. Sarrafzadeh, M. and Wong, C. K. Hierarchical Steiner tree construction in uniform orientations. IEEE Transactions Computer-Aided Design 11(9): 1095–1103, September 1992. 55. Teig, S. The x architecture: Not your father’s diagonal wiring. In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, San Diego, CA, 2002, pp. 33–37. 56. Yildiz, M. C. and Madden, P. H. Preferred direction Steiner trees. In Proceedings of the Great Lakes Symposium VLSI, West Lafayette, IN, 2001, pp. 56–61. 57. Dijkstra, E. W. A note on two problems in connection with graphs. Numerische Mathematik 1: 269–271, 1959. 58. Prim, A. Shortest connecting networks and some generalizations. Bell System Technical Journal 36: 1389–1401, 1957. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C025 Timing-Driven Interconnect Synthesis Finals Page 533 29-9-2008 #26 533 59. Cong, J., Kahng, A. B., Robins, G., Sarrafzadeh, M., and Wong, C. K. Performance-driven global routing for cell based ICs. In Proceedings of the IEEE International Conference Computer Design, Cambridge, MA, October 1991, pp. 170–173. 60. Robins, G. and Zelikovsky, A. Improved Steiner tree approximation in graphs. In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, San Francisco, CA, January 2000, pp. 770–779. 61. Kahng, A. B. and Robins, G. On performance bounds for a class of rectilinear Steiner tree heuristics in arbitrary dimension. IEEE Transactions Computer-Aided Design 11(11): 1462–1465, November 1992. 62. Griffith, J., Robins, G., Salowe, J. S., and Zhang, T. Closing the gap: Near-optimal Steiner trees in polynomial time. IEEE Transactions Computer-Aided Design 13(11): 1351–1365, November 1994. 63. Kahng, A. B. and Robins, G. A new class of iterative Steiner tree heuristics with good performance. IEEE Transactions Computer-Aided Design 11(7): 893–902, July 1992. 64. Cong, J., Leung, K. S., and Zhou, D. Performance-driven interconnect design based on distributed RC delay model. In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, June 1993, pp. 606–611. 65. Nastansky, L., Selkow, S. M., and Stewart, N. F. Cost-minima trees in directed acyclic graphs. Zeitschrift for Operations Research 18: 59–67, 1974. 66. de Matos, R. R. L. A Rectilinear Arborescence Problem. PhD thesis, University of Alabama, Tuscaloosa, Alabama, 1979. 67. Ho, J. M., Ko, M. T., Ma, T. H., and Sung, T. Y. Algorithms for rectilinear optimal multicast tree problem. In Proceedings of the International Symposium on Algorithms and Computation, Nagoya, Japan, June 1992, pp. 106–15. 68. Leung, K. -S. and Cong, J. Fast optimal algorithms for the minimum rectilinear Steiner arborescence problem. In Proceedings of the IEEE International Symposium Circuits and Systems, Vol. 3, Hong Kong, 1997, pp. 1568–1571. 69. Rao, S. K., Sadayappan, P., Hwang, F. K., and Shor, P. W. The rectilinear Steiner arborescence problem. Algorithmica 7(1): 277–288, 1992. 70. Trubin, V. A. Subclass of the Steiner problems on a plane with rectilinear metric. Cybernetics and Systems Analysis 21(3): 320–322, 1985. 71. Shi, W. and Su, C. The rectilinear Steiner arborescence problem is np-complete. SIAM Journal of Computation 35(3): 729–740, 2006. 72. Cordova, J. and Lee, Y. H. A heuristic algorithm for the rectilinear Steiner arborescence problem. Technical Report TR-94-025, University of Florida, Gainesville, FL, 1994. 73. Alexander, M. J. and Robins, G. New performance-driven FPGA routing algorithms. IEEE Transactions Computer-Aided Design 15(12): 1505–1517, December 1996. 74. Kou, L., Markowsky, G., and Berman, L. A fast algorithm for Steiner trees. Acta Informatica 15: 141– 145, 1981. 75. Cong, J., Kahng, A. B., and Leung, K. -S. Efficient algorithms for the minimum shortest path Steiner arborescence problem with applications to VLSI physical design. IEEE Transactions Computer-Aided Design 17(1): 24–39, January 1998. 76. Robins, G. On Optimal Interconnections. PhD thesis, Department of Computer Science, UCLA, CSDTR-920024, Los Angeles, CA, 1992. 77. Zhou, D., Tsui, F., and Gao, D. S. High performance multichip interconnection design. In Proceedings of the ACM/SIGDA Physical Design Workshop, Lake Arrowhead, CA, April 1993, pp. 32–43. 78. Sriram, M. and Kang, S. M. Performance driven MCM routing using a second order RLC tree delay model. In IEEE International Conference on Wafer Scale Integration, San Francisco, CA, January 1993, pp. 262–267. 79. Alpert, C. J., Gandham, G., Hrkic, M., Hu, J., Kahng, A. B., Lillis, J., Liu, B., Quay, S. T., Sapatnekar, S. S., and Sullivan, A. J. Buffered Steiner trees for difficult instances. IEEE Transactions Computer-Aided Design 21(1): 3–14, January 2002. 80. Ganley, J. L. Accuracy and fidelity of fast net length estimates. Integration: The VLSI Journal 23(2): 151–155, 1997. 81. Hong, X., Xue, T., Kuh, E. S., Cheng, C. K., and Huang, J. Performance-driven Steiner tree algorithms for global routing. In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp. 177–181. 82. Hu, J. and Sapatnekar, S. S. Algorithms for non-Hanan-based optimization for VLSI interconnect under a higher order awe model. IEEE Transactions Computer-Aided Design 19(4): 446–458, April 2000. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C025 534 Finals Page 534 29-9-2008 #27 Handbook of Algorithms for Physical Design Automation 83. Hu, J. and Sapatnekar, S. S. A timing-constrained simultaneous global routing algorithm. IEEE Transactions Computer-Aided Design 21(9): 1025–1036, September 2002. 84. Peyer, S., Zachariasen, M., and Grove, D. J. Delay-related secondary objectives for rectilinear Steiner minimum trees. Discrete and Applied Mathematics 136(2): 271–298, February 2004. 85. Wu, D., Hu, J., and Mahapatra, R. Coupling aware timing optimization and antenna avoidance in layer assignment. In Proceedings of the International Symposium on Physical Design. ACM Press, New York, 2005, pp. 20–27. 86. Hanan, M. On Steiner’s problem with rectilinear distance. SIAM Journal of Applied Mathematics 14: 255– 265, 1966. 87. Zachariasen, M. A catalog of Hanan grid problems. Networks—An International Journal 38(2): 76– 83, 2001. 88. Hou, H., Hu, J., and Sapatnekar, S. S. Non-Hanan routing. IEEE Transactions Computer-Aided Design 18(4): 436–444, April 1999. 89. Fisher, A. L. and Kung, H. T. Synchronizing large systolic arrays. In Proceedings of SPIE, Arlington, VA, May 1982, pp. 44–52. 90. Friedman, E. G. Clock distribution design in VLSI circuits—an overview. In Proceedings of the IEEE International Symposium Circuits and Systems, Chicago, IL, May 1993, pp. 1475–1478. 91. Pullela, S., Menezes, N., and Pillage, L. T. Reliable non-zero skew clock trees using wire width optimization. In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, 1993, pp. 165–170. 92. Zhu, Q., Dai, W. W. M., and Xi, J. G. Optimal sizing of high-speed clock networks based on distributed RC and lossy transmission line models. In Proceedings of the IEEE International Conference Computer-Aided Design, 1993, pp. 628–633. 93. Dutta, R. and Marek-Sadowska, M. Algorithm for wire sizing of power and ground networks in VLSI designs. Journal of Circuits, Systems and Computers 2: 141–157, June 1992. 94. Cong, J., and Leung, K. S. Optimal wiresizing under the distributed Elmore delay model. In Proceedings of the IEEE International Conference Computer-Aided Design, 1993, pp. 634–639. 95. Hodes, T. D., McCoy, B. A., and Robins, G. Dynamically-wiresized Elmore-based routing constructions. In Proceedings of the IEEE International Symposium Circuits and Systems, Vol. I, London, United Kingdom, May 1994, pp. 463–466. 96. Sapetnekar, S. RC interconnect optimization under the Elmore delay model. In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June 1994, pp. 387–391. 97. Erhard, K. H. and Johannes, F. M. Power/ground networks in VLSI: Are general graphs better than trees? Integration: The VLSI Journal 14(1): 91–109, November 1992. 98. Erhard, K. H., Johannes, F. M., and Dachauer, R. Topology optimization techniques for power/ground networks in VLSI. In Proceedings of the European Design Automation Conference, Hamburg, Germany, September 1992, pp. 362–367. 99. Lin, S. and Wong, C. K. Process-variation-tolerant clock skew minimization. In Proceedings of the IEEE International Conference Computer-Aided Design, San Jose, CA, November 1994, pp. 284–288. 100. Chan, P. K. and Karplus, K. Computing signal delay in general RC networks by tree/link partitioning. IEEE Transactions Computer-Aided Design 9(8): 898–902, August 1990. 101. Martin, D. and Rumin, N. C. Delay prediction from resistance-capacitance models of general MOS circuits. IEEE Transactions Computer-Aided Design 12(7): 997–1003, July 1993. 102. Kahng, A. B., Liu, B., and Mandoiu, I. I. Non-tree routing for reliability and yield improvement. IEEE Transactions Computer-Aided Design 23(1): 148–156, 2004. 103. Hu, S., Li, Q., Hu, J., and Li, P. Steiner network construction for timing critical nets. In Proceedings of the ACM/IEEE Design Automation Conference, 2006, pp. 379–384. 104. Borah, M., Owens, R. M., and Irwin, M. J. An edge-based heuristic for Steiner routing. IEEE Transactions Computer-Aided Design 13: 1563–1568, 1994. 105. Qiu, W. and Shi, W. Minimum moment Steiner trees. In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, 2004, pp. 488–495. 106. Saxena, P., Menezes, N., Cocchini, P., and Kirkpatrick, D. A. Repeater scaling and its impact on CAD. IEEE Transactions Computer-Aided Design 23(4): 451–463, April 2004. 107. Hrkic, M. and Lillis, J. Buffer tree synthesis with consideration of temporal locality, sink polarity requirements, solution cost, congestion and blockages. IEEE Transactions Computer-Aided Design 22(4): 481–491, April 2003. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 535 29-9-2008 #2 26 Buffer Insertion Basics Jiang Hu, Zhuo Li, and Shiyan Hu CONTENTS 26.1 Motivation . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.2 Optimization of Two-Pin Nets .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3 van Ginneken’s Algorithm .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.1 Concept of Candidate Solution .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.2 Generating Candidate Solutions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.2.1 Wire Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.2.2 Buffer Insertion.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.2.3 Branch Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.3 Inferiority and Pruning Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.4 Pseudocode .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.3.5 Example . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4 van Ginneken Extensions . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.1 Handling Library with Multiple Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.2 Library with Inverters .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.3 Polarity Constraints . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.4 Slew and Capacitance Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.5 Integration with Wire Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.6 Noise Constraints with Devgan Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.6.1 Devgan’s Coupling Noise Metric. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.6.2 Algorithm of Buffer Insertion with Noise Avoidance . . . . .. . . . . . . . . . . . . . 26.4.7 Higher Order Delay Modeling.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.7.1 Higher Order Point Admittance Model . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.7.2 Higher Order Wire Delay Model . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.7.3 Accurate Gate Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.4.8 Flip-Flop Insertion . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.5 Speedup Techniques . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.5.1 Recent Speedup Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.5.2 Predictive Pruning .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.5.3 Convex Pruning . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.5.4 Efficient Way to Find Best Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 26.5.5 Implicit Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 535 536 538 538 539 539 539 539 540 540 540 542 542 542 542 543 543 544 544 546 546 547 548 549 549 550 550 551 552 553 554 555 26.1 MOTIVATION When the VLSI technology scales, gate delay and wire delay change in opposite directions. Smaller devices imply less gate-switching delay. In contrast, thinner wire size leads to increased wire resistance and greater signal propagation delay along wires. As a result, wire delay has become 535 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 536 Finals Page 536 29-9-2008 #3 Handbook of Algorithms for Physical Design Automation a dominating factor for VLSI circuit performance. Further, it is becoming a limiting factor to the progress of VLSI technology. This is the well-known interconnect challenge [1–3]. Among many techniques addressing this challenge [4,5], buffer (or repeater) insertion is such an effective technique that it is an indispensable necessity for timing closure in submicron technology and beyond. Buffers can reduce wire delay by restoring signal strength, in particular, for long wires. Moreover, buffers can be applied to shield capacitive load from timing-critical paths such that the interconnect delay along critical paths are reduced. As the ratio of wire delay to gate delay increases from one technology to the next, more and more buffers are required to achieve performance goals. The buffer scaling is studied by Intel and the results are reported in Ref. [6]. One metric that reveals the scaling is critical buffer length, the minimum distance beyond which inserting an optimally placed and sized buffer makes the interconnect delay less than that of the corresponding unbuffered wire. When wire delay increases because of the technology scaling, the critical buffer length becomes shorter, i.e., the distance that a buffer can comfortably drive shrinks. According to Ref. [6], the critical buffer length decreases by 68 percent when the VLSI technology migrates from 90 to 45 nm (for two generations). Please note that the critical buffer-length scaling significantly outpaces the VLSI technology scaling, which is roughly 0.5× for every two generations. If we look at the percentage of block level nets requiring buffers, it grows from 5.8 percent in 90-nm technology to 19.6 percent in 45-nm technology [6]. Perhaps the most alarming result is the scaling of buffer count [6], which predicts that 35 percent of cells will be buffers in 45-nm technology as opposed to only 6 percent in 90-nm technology. The dramatic buffer scaling undoubtedly generates large and profound impact to VLSI circuit design. With millions of buffers required per chip, almost nobody can afford to neglect the importance of buffer insertion as compared to a decade ago when only a few thousands of buffers are needed for a chip [7]. Because of this importance, buffer insertion algorithms and methodologies need to be deeply studied on various aspects. First, a buffer insertion algorithm should deliver solutions of high quality because interconnect and circuit performance largely depend on the way that buffers are placed. Second, a buffer insertion algorithm needs to be sufficiently fast so that millions of nets can be optimized in reasonable time. Third, accurate delay models are necessary to ensure that buffer insertion solutions are reliable. Fourth, buffer insertion techniques are expected to simultaneously handle multiple objectives, such as timing, power, and signal integrity, and their trade-offs. Last but not the least, buffer insertion should interact with other layout steps, such as placement and routing, as the sheer number of buffers has already altered the landscape of circuit layout design. Many of these issues will be discussed in subsequent sections and other chapters. 26.2 OPTIMIZATION OF TWO-PIN NETS For buffer insertion, perhaps the most simple case is a two-pin net, which is a wire segment with a driver (source) at one end and a sink at the other end. The simplicity allows closed form solutions to buffer insertion in two-pin nets. If the delay of a two-pin net is to be minimized by using a single buffer type b, one needs to decide the number of buffers k and the spacing between the buffers, the source and the sink. First, let us look at a very simple case to attain an intuitive understanding of the problem. In this case, the length of the two-pin net is l and the wire resistance and capacitance per unit length are r and c, respectively. The number of buffers k has been given and is fixed. The driver resistance is the same as the buffer output resistance Rb . The load capacitance of the sink is identical to buffer input capacitance Cb . The buffer has an intrinsic delay of tb . The k buffers separates the net into k + 1 segments, with length of l = (l0 , l1 , . . . , lk )T (Figure 26.1). Then, the Elmore delay of this net can be expressed as k 2 t(l) = αli + βli + γ (26.1) i=0 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 537 29-9-2008 #4 537 Buffer Insertion Basics l0 Driver l1 1 l2 lk 2 k 3 Sink FIGURE 26.1 Buffer insertion in a two-pin net. where α = 12 rc, β = Rb c + rCb , and γ = Rb Cb + tb . A formal problem formulation is minimize subject to t(l) g(l) = l − (26.2) k li = 0 (26.3) i=0 According to the Kuhn–Tucker condition [8], the following equation is the necessary condition for the optimal solution. l) = 0 l) + λ∇g( ∇t( (26.4) where λ is the Lagrangian multiplier. According to the above condition, it can be easily derived that li = β , λ − 2α i = 0, 1, . . . , k (26.5) Because α, β, and λ are all constants, it can be seen that the buffers need to be equally spaced to minimize the delay. This is an important conclusion that can be treated as a rule of thumb. The value of the Lagrangian multiplier λ can be found by plugging Equation 26.5 into Equation 26.3. In more general cases, the driver resistance Rd may be different from that of buffer output resistance and so is the sink capacitance CL . For such cases, the optimum number of buffers minimizing the delay is given by Ref. [9] 1 2[rcl + r(Cb − CL ) − c(Rb − Rd )]2 k = − + 1+ (26.6) 2 rc (Rb Cb + tb ) The length of each segment can be obtained through [9] k (Rb − Rd ) CL − Cb 1 l0 = l+ + k+1 r c Rb − Rd CL − Cb 1 l1 = . . . = lk−1 = l− + k+1 r c Rb − Rd k (CL − Cb ) 1 lk = l− − k+1 r c (26.7) A closed form solution to simultaneous buffer insertion/sizing and wire sizing is reported in Ref. [10]. Figure 26.2 shows an example of this simultaneous optimization. The wire is segmented into m pieces. The length li and width hi of each wire piece i are the variables to be optimized. There are k buffers inserted between these pieces. The size bi of each buffer i is also a decision variable. A buffer location is indicated by its surrounding wire pieces. For example, if the set of wire pieces between buffer i − 1 and i is Pi−1 , the distance between the two buffers is equal to j∈Pi−1 lj . There are two important conclusions [10] for the optimal solution that minimizes the delay. First, all wire pieces have the same length, i.e., li = ml , i = 1, 2, . . . , m. Second, for wire pieces Pi−1 = {pi−1,1 , pi−1,2 , . . . , pi−1,mi−1 } between buffer i − 1 and i, their widths satisfy hi−1,1 > hi−1,2 > . . . > hi−1,mi−1 and form a geometric progression. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 538 Finals Page 538 29-9-2008 #5 Handbook of Algorithms for Physical Design Automation mk segments m0 segments h1 hm h2 b1 l1 bk lm l2 FIGURE 26.2 Example of simultaneous buffer insertion/sizing and wire sizing. 26.3 VAN GINNEKEN’S ALGORITHM For a general case of signal nets, which may have multiple sinks, van Ginneken’s algorithm [11] is perhaps the first systematic approach on buffer insertion. For a fixed signal routing tree and given candidate buffer locations, van Ginneken’s algorithm can find the optimal buffering solution that maximizes timing slack according to the Elmore delay model. If there are n candidate buffer locations, its computation complexity is O(n2 ). Based on van Ginneken’s algorithm, numerous extensions have been made, such as handling of multiple buffer types, trade-off with power and cost, addressing slew rate and crosstalk noise, and using accurate delay models and speedup techniques. These extensions will be covered in subsequent sections. At a high level, van Ginneken’s algorithm [11] proceeds bottom-up from the leaf nodes toward the driver along a given routing tree. A set of candidate solutions keep updated during the process, where three operations adding wire, inserting buffers, and branch merging may be performed. Meanwhile, the inferior solutions are pruned to accelerate the algorithm. After a set of candidate solutions are propagated to the source, the solution with the maximum required arrival time is selected as the final solution. For a routing tree with n buffer positions, the algorithm computes the optimal buffering solution in O(n2 ) time. A net is given as a binary routing tree T = (V , E), where V = {s0 } ∪ Vs ∪ Vn , and E ⊆ V × V . Vertex s0 is the source vertex and also the root of T , Vs is the set of sink vertices, and Vn is the set of internal vertices. In the existing literatures, s0 is also referred as driver. Denote by T (v) the subtree of T rooted at v. Each sink vertex s ∈ Vs is associated with a sink capacitance C(s) and a required arrival time (RAT). Each edge e ∈ E is associated with lumped resistance R(e) and capacitance C(e). A buffer library B containing all the possible buffer types that can be assigned to a buffer position is also given. In this section, B contains only one buffer type. Delay estimation is obtained using the Elmore delay model, which is described in Chapter 3. A buffer assignment γ is a mapping γ : Vn → B ∪ {b̄} where b̄ denotes that no buffer is inserted. The timing buffering problem is defined as follows. Timing-driven buffer insertion problem: Given a binary routing tree T = (V , E), possible buffer positions, and a buffer library B, compute a buffer assignment γ such that the RAT at driver is maximized. 26.3.1 CONCEPT OF CANDIDATE SOLUTION A buffer assignment γ is also called a candidate solution for the timing buffering problem. A partial solution, denoted by γv , refers to an incomplete solution where the buffer assignment in T (v) has been determined. The Elmore delay from v to any sink s in T (v) under γv is computed by D (s, γv ) = ( e= vi ,vj [D (vi ) + D (e)] ) where the sum is taken over all edges along the path from v to s. The slack of vertex v under γv is defined as Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 539 29-9-2008 #6 539 Buffer Insertion Basics Q(γv ) = min {RAT(s) − D(s, γv )} s∈T (v) At any vertex v, the effect of a partial solution γv to its upstream part is characterized by a (Q(γv ), C(γv )) pair, where Q is the slack at v under γv and C is the downstream capacitance viewing at v under γv . 26.3.2 GENERATING CANDIDATE SOLUTIONS van Ginneken’s algorithm proceeds bottom-up from the leaf nodes toward the driver along T . A set of candidate solutions, denoted by , are kept updated during this process. There are three operations through solution propagation, namely, wire insertion, buffer insertion, and branch merging (Figure 26.3). We are to describe them in turn. 26.3.2.1 Wire Insertion Suppose that a partial solution γv at position v propagates to an upstream position u and there is no branching point in between. If no buffer is placed at u, then only wire delay needs to be considered. Therefore, the new solution γu can be computed as Q(γu ) = Q(γv ) − D(e) (26.8) C(γu ) = C(γv ) + C(e) where e = (u, v) and D(e) = R(e) 26.3.2.2 C(e) 2 + C(γv ) . Buffer Insertion Suppose that we add a buffer b at u. Denote by R(b), K(b) the driving resistance and the intrinsic delay of buffer b, respectively. γu is then updated to γu where Q(γu ) = Q(γu ) − R(b) · C(γu ) + K(b) (26.9) C(γu ) = C(b) 26.3.2.3 Branch Merging When two branches Tl and Tr meet at a branching point v, l and r , which correspond to Tl and Tr , respectively, are to be merged. The merging process is performed as follows. For each solution γl ∈ l and each solution γr ∈ r , generate a new solution γ according to C(γ ) = C(γl ) + C(γr ) (26.10) Q(γ ) = min {Q(γl ), Q(γr )} The smaller Q is picked since the worst-case circuit performance needs to be considered. v u v v2 u T(v) (a) Wire insertion v1 T(u) (b) Buffer insertion FIGURE 26.3 Operations in van Ginneken’s algorithm. T(v2) (c) Branch merging T(v1) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 540 Finals Page 540 29-9-2008 #7 Handbook of Algorithms for Physical Design Automation 26.3.3 INFERIORITY AND PRUNING IDENTIFICATION Simply propagating all solutions by the above three operations makes the solution set grow exponentially in the number of buffer positions processed. An effective and efficient pruning technique is necessary to reduce the size of the solution set. This motivates an important concept—inferior solution—in van Ginneken’s algorithm. For any two partial solutions γ1 and γ2 at the same vertex v, γ2 is inferior to γ1 if C(γ1 ) ≤ C(γ2 ) and Q(γ1 ) ≥ Q(γ2 ). Whenever a solution becomes inferior, it is pruned from the solution set. Therefore, only solutions that excel in at least one aspect of downstream capacitance and slack can survive. For an efficient pruning implementation and thus an efficient buffering algorithm, a sorted list is used to maintain the solution set. The solution set is increasingly sorted according to C, and thus Q is also increasingly sorted if does not contain any inferior solutions. By a straightforward implementation, when adding a wire, the number of candidate solutions will not change; when inserting a buffer, only one new candidate solution will be introduced. More efforts are needed to merge two branches Tl and Tr at v. For each partial solution in l , find the first solution with larger Q value in r . If such a solution does not exist, the last solution in r will be taken. Because l and r are sorted, we only need to traverse them once. Partial solutions in r are similarly treated. It is easy to see that after merging, the number of solutions is at most |l | + |r |. As such, given n buffer positions, at most n solutions can be generated at any time. Consequently, the pruning procedure at any vertex in T runs in O(n) time. 26.3.4 PSEUDOCODE In van Ginneken’s algorithm, a set of candidate solutions are propagated from sinks to driver. Along a branch, after a candidate buffer location v is processed, all solutions are propagated to its upstream buffer location u through wire insertion. A buffer is then inserted to each solution to obtain a new solution. Meanwhile, inferior solutions are pruned. At a branching point, solution sets from all branches are merged by merging process. In this way, the algorithm proceeds in the bottom-up fashion and the solution with maximum required arrival time at driver is returned. Given n buffer positions in T , van Ginneken’s algorithm can compute a buffer assignment with maximum slack at driver in O(n2 ) time, because any operation at any node can be performed in O(n) time. Refer to Figure 26.4 for the pseudocode of van Ginneken’s algorithm. 26.3.5 EXAMPLE Let us look at a simple example to illustrate the work flow of van Ginneken’s algorithm. Refer to Figure 26.5. Assume that there are three nondominated solutions at v3 whose (Q, C) pairs are (200, 10), (300, 30), and (500, 50) and there are two nondominated solutions at v2 whose (Q, C) pairs are (290, 5) and (350, 20) We first propagate them to v1 through wire insertion. Assume that R(v1 , v3 ) = 3 and C(v1 , v3 ) = 2. Solution (200, 10) at v3 becomes (200 − 3 · (2/2 + 10), 10 + 2) = (167, 12) at v1 . Similarly, the other two solutions become (207, 32) and (347, 52). Assume that R(v2 , v3 ) = 2 and C(v2 , v3 ) = 2, solutions at v2 become (278, 7) and (308, 22) at v1 . We are now to merge these solutions at v1 . Denote by l the solutions propagated from v3 and by r the solutions propagated from v2 . Before merging, partial solutions in l are (167, 12) , (207, 32) , and (347, 52) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 541 29-9-2008 #8 541 Buffer Insertion Basics Algorithm: van Ginneken’s algorithm Input: T : routing tree, B : buffer library Output: γ which maximizes slack at driver 1. for each sink s, build a solution set {γs }, where Q (γs ) = RAT (s) and C (γs ) = C (s) 2. for each branching point/driver v t in the order given by a postorder traversal of T, let T be each of the branches T 1 , T 2 of v t and be the solution set corresponding to T , do 3. for each wire e in T , in a bottom-up order, do 4. for each γ ∈ , do 5. C (γ ) = C (γ ) + C (e) 6. Q (γ ) = Q (γ ) − D (e) 7. prune inferior solutions in 8. if the current position allows buffer insertion, then 9. for each γ ∈ , generate a new solution γ 10. set C (γ ) = C (b ) 11. set Q (γ ) = Q (γ ) − R(b ) · C (γ ) − K (b ) 12. = {γ } and prune inferior solutions 13. // merge 1 and 2 to v t 14. set v t = ∅ 15. for each γ1 ∈ 1 and γ2 ∈ 2 , generate a new solution γ 16. set C (γ ) = C (γ1 ) + C (γ2 ) 17. set Q (γ ) = min{Q (γ1 ),Q (γ2 )} 18. vt = v t {γ } and prune inferior solutions 19. return γ with the largest slack FIGURE 26.4 van Ginneken’s algorithm. and partial solutions in r are (278, 7) and (308, 22) After branch merging, the new candidate partial solutions whose Q are dictated by solutions in l are (167, 19) , (207, 39) , and (308, 74) and those dictated by solutions in r are (278, 59) and (308, 74) V2 S0 V1 S1 S3 S4 V3 S2 FIGURE 26.5 Example for performing van Ginneken’s algorithm.

Handbook of algorithms for physical design automation part 56

Nội dung