您的当前位置：首页 Efficient substitution of multiple constant multiplications by shifts and additions using i

Efficient substitution of multiple constant multiplications by shifts and additions using i

来源：筏尚旅游网

Efﬁcient Substitution of Multiple Constant Multiplications by Shifts and

Additions using Iterative Pairwise Matching

Miodrag Potkonjak

C&C Research Laboratories, NEC USA, Inc., Princeton, NJ 080

Mani B. Srivastava

AT&T Bell Laboratories, Murray Hill, NJ 07974

Anantha Chandrakasan

University of California, Dept. of EECS, Berkeley, CA 94720ABSTRACT

Many numerically intensive applications havecomputations that involve a large number ofmultiplications of one variable with several constants.A proper optimization of this part of the computation,which we call the multiple constant multiplication(MCM) problem, often results in a significantimprovement in several key design metrics.

After defining the MCM problem, we formulate it as aspecial case of common subexpression elimination.The algorithm for common subexpression eliminationis based on an iterative pairwise matching heuristic.The flexibility of the MCM problem formulationenables the application of the iterative pairwisematching algorithm to several other important highlevel synthesis tasks. All applications are illustratedby a number of benchmarks.

Multiple constant multiplication is a newtransformation closely related to the widely usedsubstitution of multiplications with constants by shifts andadditions. While the latter considers multiplication withonly one constant at a time, the new transformationconsiders several different constant multiplication withthe same variable simultaneously.

Optimization of multiplication with a singleconstant has for a long time been recognized as beingimportant for compilers and high level synthesis systems.It has been used for improving area [Cha93] and power[Cha92]. In ASICs as well as many microprocessors, it issignificantly less expensive to do additions, subtractionsand shifts, than it is to do a multiplication. The importanceof multiplication with constants is also indicated by recentresults [Cha92] showing an order of magnitude reductionin power achieved by this transformation alone.

We formulate the MCM problem in such a way thata minimum number of shifts is selected, and after whichthe number of additions is minimized. The formulationhas several distinctive advantages. First, it enables one totreat the problem of minimizing the number of additionsas one of common subexpression elimination in arestricted domain. This allows efficient and lowcomplexity algorithms to be used. Second, the formulationenables an interesting theoretical analysis of theeffectiveness of the multiple constant multiplicationtransformation, which leads to the surprising asymptoticresult that both the number of shifts and additions staybounded and finite regardless of the size of the problem.Finally, our formulation also allows the MCM approach tobe applied to a number of high level synthesis tasks thatare based on common subexpression elimination.

1.0Motivation and Problem Relevance

Computational transformations have recentlyattracted much attention in the high level synthesiscommunity [Wal, Rab91, Wal91, Ku92]. Transformationshave also been studied in areas such as algorithm designfor numerical and DSP applications [Bla85] and compilers[Fis91].The main technical novelty of this use oftransformations in high level synthesis has been thedevelopment of powerful optimization algorithms toapply these transformations.

The exceptional ability of the human brain torecognize and manipulate regularity, symmetry, and otherstructural properties of computation has been used togreat advantage in the design of algorithms such as FFTand DCT [Bla85]. Software compilers, on the other hand,employ fast and relatively simple automatic techniques toapply algorithm transformations on very large programs[Fis91]. Compared to the approaches used in algorithmdesign and software compilers, the high level synthesisapproach to transformations is better suited totransformations that require handling of numerous details,involve complex combinatorial optimization, and do notrequire a large and sophisticated mathematical knowledgebase. This paper addresses multiple constantmultiplication, which is exactly such a transformation.

31st ACM/IEEE Design Automation Conference ®

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying it is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1994 ACM 0-791-653-0/94/0006 3.50

2.0Related work

Optimization of multiplications with a constant hasbeen studied for a long time, including pioneering workby von Neumann and his coworkers [Bur47]. Later, severalschemes were proposed for minimizing the number ofoperations after substitution of multiplication withconstants by shifts and additions were [Rob60]. This workled to novel number representation schemes, such as thecanonical signed digit (CSD) representation [Rei60,

Hwa79] that is often used in DSP to reduce the number ofXshifts and additions after a multiplication with a constantis replaced by shifts and addi tions. The CSDa representation is also sometimes used in high level **synthesis [Rab91]. Algorithms for optimizing the number of shifts andadditions have also been described in the softwarecompiler research [Ber86]. The minimization of number ofshifts is also a well studied topic in DSP, in particular inthe digital ﬁlter design community [Cat88].While minimizing the number of shifts required formultiplication with a constant is a thoroughly studied, notmuch attention has been paid to the simultaneousoptimization of the multiplication of a variable by multipleconstants. Only recently Chatterjee et. al. [Cha93]addressed this problem by presenting two optimizationapproaches - greedy and simulated annealing-based - forthe minimization of the number of operations in vector-matrix product representation of linear systems. Althoughthose two algorithms, based on a number splitingtechnique, are theoretically interesting, the algorithmproposed in this paper is superior in its simplicity,effectiveness, and range of applications.A problem closely related to constant multiplicationis that of computing a constant power of a variable byusing only multiplications. A lucid treatment oftechniques for this problem, covering work dating as farback as two millennia, was done by Knuth [Knu81].Xb*(c)Da*Db*Dc*d(a)XD>>2+>>3+>>1>>5+>>9+++>>6+(b)>>8D+Figure1:Motivational Examples: (a) multiplication withtwo different constants; (b) after substitutions ofmultiplication with shifts and additions; (c) morecomplex example3.0Problem Formulation and ExamplesThe examples shown in Figure 1 a-c introduce themultiple constant multiplication problem (MCM).Suppose the goal is to implement the computation inFigure 1a, using only shifts and additions. The constants, a= 815 and b = 621 can be represented in the binary form as11001011112 and 10011011012 respectively. Note thatseveral of the shifts (first, third, fourth, sixth and tenthfrom the right) can be shared during the computation ofthe two different products as shown in Figure 1b. Oneneeds a total of only 7 shifts (no shift is needed for theright-most digit) due to fact that both the constants arebeing multiplied with the same variable. The secondimportant point is that many of intermediate results ofadditions can also be shared. For example, a*X =1000101101*X + 1100101111*X, and b*X = 1000101101*X +1001101101*X share the common term 1000101101*X.Next consider the example in Figure 1c. The binaryrepresentation of all the constants is shown in Table 1.abcd8156218311051100101111100110110111001111110001101001Obviously, now there exist an even greater number ofpossibilities to share shifts and additions whilemultiplying the variable X with the multiple constants.Table 2 shows the number of the identical shifts for allpairs of constant multiplications. If additions are sharedbetween a and c, and between b and d, it is easy to see thatone will save 9 additions. While initially 21 additions wereneeded, now only 12 additions are sufﬁcient.However, note that the optimization process can becontinued further, and that two shifts (first and sixth fromthe right) are present in all multiplications. So if their sumis computed first, this intermediate result can be sharedamong all the constant multiplications, thus saving onemore addition.If the sharing of shifts and additions is not used, theexample from Figure 1c requires 21 shifts (excluding shiftsby 0) and 21 additions. However, when intermediateresults in forming products are exploited, only 8 shifts and11 additions are needed.abcda-b5-c75-d344-Table2: Number of matching shifts between all pairsof multiplication by constants for the example inFigure 1cTable1:Binary representation of constants forexample from Figure 1c.An interesting and intuitively appealing idea is toemploy canonical signed digit encoding, or some otherencoding which reduces the number of ones in the binaryrepresentation of the constants, and then attempt sharing190

of additions and subtractions. Table 3 shows the constantsabcd815621831105110011000N1001101101110100000N0001101001Table3:Alternative representation of constants forexample from Figure 1c. N represents -1We can summarize the drawbacks of the bipartitematching approach by pointing out that it is oftenadvantageous to form intermediate constants bycombining parts of more than two constants. Anotherimportant bottleneck is that bipartite matching at one leveldoes not take into account how a particular matchinﬂuences matching at the next level.

In order to preserve the advantages of usingmatching algorithms to solve the MCM problem whileaddressing the drawbacks of bipartite matching, wedeveloped the algorithm described by the followingpseudo-code:

for Figure 1c after one such encoding. Note that now only7 shifts are needed. Table 4 shows the number ofintermediate results that can be shared duringmultiplications by constants. Again, it is advantageous tocombine some of intermediate results. As is suggested byTable 4, one will profit most if a and c are combined first,followed by combining b and d. This reduces the numberof additions and subtractions to only 10.ab c da-b2-c32-d141-Iterative matching for the MCM problem:

Express all constants using binary (or CSD)representation;

Eliminate all duplicates of the same constants;

Eliminate all constants which have at most one nonzerodigit in their binary (CSD) representation;

Let CANDIDATES = Set of all constants in binaryrepresentation;

Calculate Matches Between all elements of the setCANDIDATES;

while there exist a match between two entries in at least 2binary digits {

Select Best Match;

Update the set CANDIDATES;

Update Matches by adding matches between new

entries and those already existing in the set CANDIDATES;

}

The first step is a simple conversion. The next twosteps are preprocessing steps, which in practice oftensignificantly reduce the run time of the algorithm. Of allidentical constants only one instance is included in the setof candidates. At the end of the program, all constantswhich had initially the same value as the some includedconstants, are calculated using the same set of commonsubexpressions. The third step is based on a simple andobvious observation that only constants which have atleast two nonzero digits are suitable for commonsubexpression elimination.

Match between two constants is equal to thenumber of identical nonzero digits at the same position intheir binary representations, reduced by one (because n-1additions are needed to add n numbers). The best match isselected according to an additive objective function whichcombines immediate saving and the change in likelihoodfor later savings. The immediate payoff is equal to thereduction in the number of operations when a particularpair of constants is chosen. To estimate the potential futuresavings after a particular match is selected, we estimatethe influence of the selecting this match on our futureability to reduce the number of additions using matchingbetween the remaining constants. We do this by evaluatingthe difference in the average of the top k (where we usek=3 based on empirical observations) best bitwise matchesin the set CANDIDATES, and the average of the top k best

Table4: Name shifts between all pairs ofmultiplications by constant, when both additions andsubtractions are usedWe conclude this section by formally stating themultiple constant multiplication problem:Substitute allmultiplications with constants by shifts and additions (andsubtractions), and use common subexpressions between variousmultiplications to minimize the number of additions (andsubtractions). The next section describes a systematicapproach for accomplishing this.4.0 Iterative Matching AlgorithmThe analysis of the MCM problem in the previoussection indicates that a natural way to solve the MCMproblem is to executerecursive bipartite matching. Recursivebipartite matching will match at each level all constants inpairs, so that the payoff is maximized for each single level.There are a number of very efficient algorithms forbipartite matching [Cor90]. However, this approach hasseveral serious drawbacks, the most important of which isillustrated by the following example. Suppose that oneneeds to multiply a variable X with constants a, b and c,such that a = 1111111111000002, b = 1111100000111112, and c= 0000011111111112. Obviously, bipartite matching willcombine only two of these constants, and result in a savingof 4 additions so that 23 additions are needed after theapplication of the MCM transformation. However, if onefirst forms numbers d = 111110000000002, e =0000011111000002, andf = 0000000000111112, then bynoting that a * X = d * X + e * X, b * X = d * X + f * X, and c *X = e * X + f * X one needs only four addition each forcomputing d * X, e * X, and f * X, and 3 more for computinga * X, b * X, and c * X, for a total of only 15 additions.191

bitwise matches in the set CANDIDATES excluding thetwo constants being considered for the current match. Theintuition behind this measure is that this average is a goodindicator of potential for matching the remainingcandidates among themselves.

The set CANDIDATES is updated by ﬁrst removingthe two constants which constituted the best match, andthen adding the constant corresponding to the matcheddigits, as well as the differences between the two matchedconstants and this newly formed constant.

The algorithm works the same way whether onlyadditions are used, or both additions and subtractions Theonly difference is how matches are computed. Supposethat we have two numbers A and B such that both havedigit 1 on a identical binary positions, both have digit -1 onb binary positions, A has digit 1 and B has digit -1 on cbinary positions, and A has digit -1 and B has digit 1 on dpositions number. The number of matches is computed assum a + b + max (0, c+d-1). This matching function isbased on the observation that we can always match allidentical digits, and that nonzero digits can be alsomatched, but this type of matching will result in one lesssaved operation.

The following theorem, which we state withoutproof, clearly indicates the effectiveness of the MCMapproach on large instances of the problem.

Asymptotic Effectiveness Theorem:An arbitrarilylarge instance of the multiple constant multiplication problemcan always be implemented with a bounded finite constantnumber of shifts and additions irrespective of the problem size.An important consequence of the asymptoticeffectiveness theorem is that as the size of the MCMproblem increases, the MCM transformation isincreasingly more effective.

Japan. Figures 3a and 3b, show the transfer function of thefilter corresponding to double precision floating-pointarithmetic and finite precision fixed-point arithmetic withthe same word-length after applying the MCMtransformation respectively.IND+D(a)(b)++OUTIND+D+D+DOUTFigure2:Applying Retiming to enable the MCMtransformation: (a) before retiming; (b) after retiming(a)50.030.010.0-10.0-30.0-50.0-70.0-90.0-110.0-130.0-150.00.050.030.010.0-10.0-30.0-50.0-70.0-90.0-110.0-130.0-150.00.0256.0512.0768.01024.0(b)256.0512.0768.01024.05.0Numerical Stability and Relationship

with other Transformations

While the effectiveness of transformations inimproving implementation metrics is well documented,the effect of transformations on word-length requirementsis a rarely addressed topic in high level synthesis andcompiler literature. The attitude toward numericalstability issues varies significantly, ranging from denial ofthe problem to avoidance of applying transformations.One of the few transformations which is well suitedfor a theoretical analysis of numerical stability is the MCMtransformation. An analysis by Golub and van Loan[Gol90] indicates that if one additional binary digit is used,one can apply an arbitrary combination of commonsubexpressions without disturbing the correctness of theanswer. This analysis also indicates that even thisadditional digit is statistically very unlikely to be needed.We experimentally verified this claim on a 126 tap FIRfilter which is part of a PCM system developed by NEC

Figure3:Simulation Results for the NEC FIR ﬁlter before(a) and after the MCM transformationsIt is also well known that the application of isolatedtransformations is often not sufficient to achieve desiredresults. The successive or simultaneous application ofseveral transformations is often much more effective dueto the ability of some transformations to increase theeffectiveness of others. On majority of the designs whichwe examined, retiming was most often required toeffectively enable the MCM transformation. Figure 2shows typical examples where retiming makes the MCMtransformation much more effective. Performing retimingsuch that the payoff from applying the MCMtransformation is maximized is an involved combinatorialproblem. However, on all examples that we considered, itwas sufficient to adopt a simple retiming approach wherethe delays were just moved from edges where they werepreventing the application of MCM transformations.192

Furthermore, note that even this retiming is actually notnecessary - the shifts and additions in the MCMtransformations can be shared across delays by taking intoaccount that some of the intermediate results will beutilized in future iterations of the ASIC computationwhich is always done on semi-inﬁnite streams of data.application of common subexpression elimination toreduce the number of operations needed for a givencomputation. This relationship with commonsubexpression elimination allows the MCM approach tobe easily modified for a large number of important highlevel synthesis optimizing transformation tasks.A direct application of the MCM methodology andsoftware to a new high level synthesis task is tomultiplication-free linear transforms. The general form ofmultiplication free linear transform is Y = A * X, where Yand X are n-dimensional vectors and A is n× n quadraticmatrix. The matrix A has as entries only values 1, -1, and 0. We will introduce the application of the MCMtransform for the optimization of multiplication freecomputations using Hadamard matrix transform. Figure 4shows the Hadamard matrix of size 8× 8.6.0Experimental ResultsThe iterative matching algorithm is very efficientand compact - it required slightly more than 1000 lines ofC code. Table 5 shows the set of benchmarks examples onwhich it was applied The benchmark examples are: 126DESIGNN FIRDACM FIR1M FIR2FIRCon4Con 5MATPower8IIR2D FIRINITIAL# of# of +/>>-1602023494161772311381701231442121833833588374136120184200804807MCM## of>>+/ -1713827719158151251310116862513714495915102141377INITIAL/MCM# of# of>>+/ -9.411.463.561.509.321.469.251.369.461.4313.24.5915.32.615.931.512.341.2112.31.965.701.56111A=111111−11−11−11−111−1−1−1−1111−1−11−111−11111−1−1−1−11−11−1−11−1111−1−1−1−1111−1−11−111−1Figure4:Hadamard Matrix of Size 8 X 8Table5:The application of the MCM IterativeMatching algorithm on 11 examplestap NEC FIR filter (N FIR), NEC digital to analogconverter (DAC), 100 and 123 FIR filters (M FIR1 and M FIR2), tap FIR filter, 3 GE linearcontrollers (Con4, Con5 and Mat) [Cha93], Linear powercontroller [Power], 8th order direct form IIR [8IIR], andtwo dimension 10X10 FIR video ﬁlter [2D FIR].The performance of the iterative matchingalgorithm on these examples is detailed in Table 5.Average (Median) Reduction in number of shiftsby factorof 8.71 (9.32). Average (Median) Reduction in number ofadditions by factor of 1.71 (1.50).Using the MCM algorithm to reduce the number ofoperations also helps in reducing the implementation areaand power consumption. For example, in the case ofNEC’s 126-tap low-pass FIR filter, the estimates obtainedby using the HYPER high-level synthesis system [Rab91]show factors of 2.76 and 2.55 reductions in area and powerrespectively. The iterative matching algorithm took lessthan 2 seconds for even the largest example (2D FIR ﬁlter).The analogy with the MCM problem is apparent.Matrix A corresponds to the binary representation of theconstants in the MCM problem, and elements of vector Xcorresponds to the variable shifted by various amounts inthe MCM problem. While the direct computation of the10000G=000000110000000001010000000011100100000100100000001101001000010110000100111101101001000100000011001001000101010000101110110101010011000001110110110011011100011111111111111Figure5:Encoding matrix for (16,11) second-orderReed-Muller code.7.0Extensions to other High LevelSynthesis TasksAt the heart of the MCM approach is the recursiveHadamard transform requires 56 additions, the MCMapproach reduces this number to 24.193

The encoding and decoding algorithms of manyerror correcting codes can be represented as vector-matrixproduct in the form c = d * G. c is vector of k encoded bits,d is vector of m information bits, and G is m X k matrix,which describes the used encoding algorithm. Figure 5shows the matrix G for the second-order (16,11) Reed-Muller code [Rhe].The analogy with the MCM transform is apparent.If the columns of matrix G are interpreted as the elementsof the set of constants, the minimization of addition toproduce the set of constants (65535, 21845, 13107, 3855,A simple generalization of the problem was used tosigniﬁcantly enlarge the application range of the proposedapproach. On a large set of industrial examples the MCMapproach yielded signiﬁcant improvements in the numberof operations was achieved.

Acknowledgments

The authors would like to thank M. Cheong for helpin developing the final version of the MCM iterativepairwise matching program.

255, 4369, 1285, 85, 771, 51, 15) is equivalent to minimizingthe number of additions needed for encoding using theReed-Muller code.Table 6 shows the set of error-correctionbenchmarks [Rhe] on which we applied the MCMapproach. The average and median reductions in thenumber of additions is by factors of 1.55 and 1.56.ExamplesAdditionsIMCMI/MCMGoppa18111.Hadamard21161.31Reed-Muller61431.41Hamming105661.59Fire44291.52BCH183991.85Table6: Benchmark examples for linear codesThe algorithms for the minimization of the numberof shifts and additions, with a slight modifications, wereapplied on several widely used general linear transformsshown in Table 7. Again, a very signiﬁcant reduction in thenumber of operations is achieved.Ex# of# of >># of +IbitsIMINMI/MI/MDC830872300944.23.2DC12376743681005.13.7DC165291075211294.94.0DC2479719072124.23.7HV812291119951.31.2HV122611532551611.71.6HV163672113612261.71.7HV245863135803341.91.7DC- discrete cosine transform; HV - IEEE Human VisionTable7: Benchmark example for linear transforms:Sensitivity Transform; I - initial; M - after MCM.8.0ConclusionWe formulated the multiple constant multiplication(MCM) problem, and proposed an iterative pairwisematching algorithm for solving it. The relationship of theMCM transformation to other transformations is studied9.0References:

[Ber86] R. Bernstein: “Multiplication by Integer Constants”,

Software - Practice and Experience, Vol. 16, No. 7, pp.1-652, 1986.[Bla85] R.E. Blahut: “Fast Algorithms for Digital Signal Pro-cessing”, Addison-Wesley, 1985.[Cat88] F. Catthoor, et. al.: “SAMURAI: a General and Efﬁcient

Simulated Annealing Schedule with Fully AdaptiveAnnealing Parameters”, Integration, Vol. 6, 1988.[Cha92] A. Chandrakasan, et. al.“HYPER-LP: A System for

Power Minimization Using Architectural Transforma-tions”, IEEE ICCAD-92, pp. 300-303, 1992.[Cha93] A. Chatterje, R.K. Roy, M.A. d’Abreu: “Greedy hard-ware optimization for linear digital systems using num-ber splitting and repeated factorization”, IEEE Tran. onVLSI) Systems, Vol. 1, No. 4, pp. 423-431, 1993.[Cor90] T.H. Cormen, C.E. Leiserson, R.L. Rivest: “Introduction

to Algorithms”, The MIT Press, Cambridge, MA, 1990.[Fis91] C.N. Fischer, R.J. LeBlanc, Jr.: “Crafting a Compiler”,

Benjamin/Cummings, Menlo Park, CA, 1991.[Gol] G.H. Golub, C. van Loan: “Matrix Computation”, The

Johns Hopkins University Press, 19.[Knu81] D.E. Knuth: “The Art of Computer Programming: Vol-ume 2: Seminumerical Algorithms”, 2nd edition, Addi-son-Wesley, Reading, MA, 1981.[Ku92] D. Ku, G.De Micheli: “Constrained Synthesis and Opti-mization of Digital Circuits from Behavioral Speciﬁca-tions”, Kluwer Academic Publishers, 1992.[Rab91] J. Rabaey, C. Chu, P. Hoang, M. Potkonjak: “Fast Proto-typing of Data Path Intensive Architecture”,IEEEDesign and Test, Vol. 8, No. 2, pp. 40-51, 1991.[Rhe] M.Y. Rhee: “Error-Correcting Coding Theory”,

McGraw Hill. New York, NY, 19.[Rei60] G.W. Reitwiesner: “Binary Arithmetic”, in Advances in

Computers”, Vol. 1, Academic Press, pp. 261-265, NewYork, NY, 1960.[Wal] R.A. Walker, D.E. Thomas: “Behavioral Transformation

for Algorithmic Level IC Design” IEEE Transactions onCAD, Vol 8. No.10, pp. 1115-1127, 19.[Wal91] R.A. Walker, R. Camposano: “A Survey of High-Level

Synthesis Systems”, Kluwer Academic Publishers, Bos-ton, Ma, 1991.

194

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文