# Low Power High Speed SQRT Carry Select Adder

Partha Mitra<sup>1</sup>, Debarshi Datta<sup>2</sup>

<sup>1,2</sup>(Electronics and Communication, Brainware Group of Institutions, India)

**Abstract :** Design of high speed and low power data path logic systems are one of the most challenging areas of research in VLSI system design. Adder circuit is the main building block in DSP processor. However, Digital adders suffer with the problem of carry propagation delay. To alleviate this problem Carry Select Adder (CSLA) are used in computational unit. Carry Select Adder one of the fastest adder among other. There is scope to reduce the power consumption in the regular CSLA. A simple gate level modification is required of the regular CSLA to reduce the power. This paper proposes modified 40-bit square-root CSLA (SQRT CSLA) architecture. Both the regular and modified 40-bit CSLA are designed with TSMC 0.13-µm CMOS process technology and results are compared with TSMC 0.18-µm CMOS process technology. The proposed design has reduced area and power as compared with the regular SQRT CSLA withy slight increase in the delay. The result analysis shows that proposed CSLA has better performance than conventional CSLA.

## I. Introduction

Due to the rapid growth of portable electronic component the low power arithmetic circuit have become very important in VLSI industry. Multiplier-Accumulator (MAC) unit is the main building block in DSP processor. Full Adder is a part of the MAC unit can significantly affect the efficiency of whole system. Hence the reduction of power consumption of Full Adder circuit is necessary for low power application. Carry Select Adder are used for high speed application by reducing propagation delay.

The basic operation Carry Select Adder (CSLA) is parallel computation. CSLA generates many carriers and partial sum [1]. The final sum and carry are selected by multiplexers (mux). Multiple pairs of Ripple Cary Adders (RCA) are used in CSLA structure. Hence, the CSLA is not area efficient. In this paper, we propose a CSLA architecture.

The proposed method use Binary to Excess-1 converter (BEC) instead of RCA with Cin=1 in the regular CSLA. The main goal of this BEC logic is to use lesser number of logic gate than the n-bit Full Adder. So that, the modified CSLA architecture is lower area and power consumption [2]-[4]. The details of the BEC logic are discussed in Section III.

This paper is organized as follows. Section II presents the delay evaluation methodology of basic adder block. The structure and the function of the BEC logic comes from the Section III. The SQRT CSLA has been chosen for comparison with the proposed design as is has more balanced delay and need lower power [5]-[6]. The delay evaluation methodology of the regular and modified SQRT CSLA are presented in Sectioned IV and V, respectively. Section VI reviews the results obtained from the simulations and Section VII concludes this work.

## II. Delay And Area Evaluation Methodology Of The Basic Adder Blocks

An XOR gate consists of basic gates like AND, OR, and Inverter (AOI) shown in Fig. 1. The gates are performing parallel operation between the dotted line and the numeric representation of each gate indicates the delay contributed by that gate. For the delay and area evaluation methodology all the gates having equal to 1 unit delay and 1 unit area.. The maximum delay can be find out by adding gates of a longest path of a logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder(HA), and Full Adder (FA) are evaluated and listed in Table I.



Fig. 1. Delay evaluation of an XOR gate.

| Adder blocks | Delay | Area |
|--------------|-------|------|
| XOR          | 3     | 5    |
| 2:1 Mux      | 3     | 4    |
| Half Adder   | 3     | 6    |
| Full Adder   | 6     | 13   |

TABLE I : DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA

#### III. **Bec Logic Gate**

The proposed method uses BEC logic. The regular CSLA structure consists of two Ripple Carry Adders (RCA). One of RCA use with initial carry Cin=0 and with carry Cin=1. BEC is use instead of RCA with Cin=1 in order to reduce and power consumption of the regular CSLA. To replace the n-bit RCA, an n+1 bit BEC is required. The structure of a 4-bit BEC is shown in Fig. 2 and Table II shows its corresponding Boolean expression.



Fig. 2. 4-b BEC Logic Gates.

| TABLE II: FUNCTION OF THE 4-BIT BEC |        |  |
|-------------------------------------|--------|--|
| B[3:0]                              | X[3:0] |  |
| 0000                                | 0001   |  |
| 0001                                | 0010   |  |
| 0010                                | 0011   |  |
|                                     |        |  |
|                                     |        |  |
| 1110                                | 1111   |  |
| 1111                                | 0000   |  |



From Fig. 3 shows the 4-bit BEC and a 8:4 multiplexer perform the basic function of CSLA. One input of the multiplexer is direct input (B3,B2,B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The Boolean expressions of the 4-bit BEC are shown below (note the functional symbols ~ NOT, & AND, ^ XOR)

 $X0 = \sim B0$  $X1 = B0 \wedge B1$  $X2 = B2^{(B0)} \& B1$  $X3 = B3^{(B0)} B1 \& B2$ 

#### IV. Delay And Area Evaluation Methodology Of Regular 16-B SQRT CSLA

The 16-b regular SQRT CSLA structure is shown in Fig. 4. It has five groups of different size RCA. Fig. 5 shows the delay and area evaluation. The numerals within [] specify the delay values, e.g., sum2 requires 10 gate delays. The steps leading to the evaluation are as follows.

The group2 [see Fig. 5(a)] requires two sets of 2-bit RCA. Delay calculation on considering the Table I, the arrival time of selection input c1[time(t) = 7] of 6:3 mux is earlier than s3[t = 8] and later than s2[t=6]. Thus, sum3[t = 11] is summation of s3 and mux[t = 3] and sum2[t = 10] is summation of c1 and mux.



Fig. 4. Regular 16-bit SQRT CSLA



Fig 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) group3, (c) group4 and (d) group5. F is Full Adder.

*Except for group2, the arrival time of mux selection input is always greater than the arrival time of data outputs from the RCA's. Thus, the delay of group3 to group5 is determined, respectively as follows:* 

- ${c6, sum[6:4]} = c3[t = 10] + mux$ 
  - $\{ c10, sum[10:7] \} = c6[t=13] + mux$
- $\{\text{count, sum}[15:11]\} = c10[t = 16] + mux$

The one set of 2-bit RCA in group 2 has 2 FA for Cin = 1 and the other set has 1 HA for Cin = 0. Area consideration of Table I, the total number of gate can be calculated as follows:

Gate count = 57 (FA + HA + Mux) FA = 39 (3 \* 13) HA = 6 (1 \* 6) Mux = 12( 3 \* 4). Similarly, the maximum delay and area of the other groups can be calculated in the regular SQRT CSLA are evaluated in Table III.

| Group  | Delay | Area |  |
|--------|-------|------|--|
| Group2 | 11    | 57   |  |
| Group3 | 13    | 87   |  |
| Group4 | 16    | 117  |  |
| Group5 | 19    | 147  |  |

TABLE III : DELAY AND AREA COUNT OF REGULAR CSLA

### V. Delay and Area Evaluation Methodology of Proposed 16-bit SQRT CSLA

The Modified 16-bit SQRT CSLA is shown in Fig. 6. RCA with Cin = 1 is replaced by BEC logic gates. These are again five groups. Fig. 7. provides delay and area estimation of each group.



Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. H is Half Adder.

The delay evaluation procedures are as follows:

- a) The group2 [see Fig. 7(a)] has one 2-bit RCA which has 1 FA and 1 HA for Cin = 0. A 3-bit BEC is used in place of another 2-bit RCA with Cin = 1.The 3-bit RCA adds one to the output from 2-bit RCA. Delay consideration of Table I, the arrival time of selection input c1[time (t) = 7] of 6:3 mux is earlier than the s3[t=9] and c3[t = 10] and later than the s2[t = 4]. Thus, the sum3 and final c3 (output from mux are depending on s3 and mux and partial c3 ( input to mux ) and mux ,respectively. The sum2 depends on c1 and mux.
- b) For the rest of the group's the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC's. Hence, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay.
- *c)* The area count of group2 is calculated as follows:

Gate count = 43 (FA + HA + Mux + BEC) FA = 13 (1 \* 13) HA = 6 (1 \* 6) AND = 1NOT = 1XOR = 10 (2 \* 5) Mux = 12 (3 \* 4)

*d*) Similarly, the maximum delay and the area of the groups of the modified SQRT CSLA are evaluated in Table IV.

| Group  | Delay | Area |
|--------|-------|------|
| Group2 | 13    | 43   |
| Group3 | 16    | 61   |
| Group4 | 19    | 84   |
| Group5 | 22    | 107  |

TABLE IV : DELAY AND AREA COUNT OF MODIFIED SQRT CSLA

Comparing Tables III and IV, it is clear that proposed modified SQRT CSLA saves 113 gate areas than regular SQRT CSLA, with only 11 increase in gate delays.

### VI. Simulation Results

The proposed 40-bit SQRT CSLA has been developed using TSMC 0.13- $\mu$ m and compared with TSMC 0.18- $\mu$ m CMOS process technology.

TABLE V : THE COMPARISON OF THE REGULAR AND MODIFIED 40-BIT SQRT CSLA FOR TSMC 0.18-µM CMOS PROCESS TECHNOLOGY

| Type of      | Supply Voltage (V) | Delay (ns) | Switching Power (µw) | Power-Delay Product (10 <sup>-</sup> |
|--------------|--------------------|------------|----------------------|--------------------------------------|
| Adders       |                    |            |                      | <sup>15</sup> J)                     |
| Regular CSLA | 1.8                | 6.324      | 1415.4               | 8823.6                               |
| Modified     | 1.8                | 6.657      | 1226.4               | 8164.1                               |
| CSLA         |                    |            |                      |                                      |

The above table shows the modified 40-bit SQRT CSLA adder has a reduced Power-Delay Product (PDP) by 7.5%.

TABLE VI : THE COMPARISON OF THE REGULAR AND MODIFIED 40-BIT SQRT CSLA FOR TSMC 0.13- $\mu$ M Cmos Process Technology

| Type of Adders | Supply      | Delay | Switching Power (µw) | Power-Delay Product (10 <sup>-15</sup> J) |
|----------------|-------------|-------|----------------------|-------------------------------------------|
|                | Voltage (V) | (ns)  |                      |                                           |
| Regular CSLA   | 1.5         | 5.986 | 1283.7               | 7684.2                                    |
| Modified       | 1.5         | 6.316 | 1057.5               | 6488.8                                    |
| CSLA           |             |       |                      |                                           |

Table VI shows the modified 40-bit SQRT CSLA adder has a reduced Power-Delay Product (PDP) by 15.6%. Hence, comparing Table V and Table VI, it is shows that modified 40-bit CSLA has better performance under TSMC 0.13-μm and TSMC 0.18-μm CMOS process technology.

#### VII. Conclusion

In this paper, a modified 40-bit SQRT CSLA has been proposed for data path circuit (MAC unit) for low power DSP application. Table V and Table VI shows that modified CSLA has reduced the power as compare with regular CSLA with slightly increase in delay. The reduction in the number of gates of this work offers great advantage in terms of area and power. The compared results also show that the modified SQRT CSLA has lower power-delay product (PDP). Hence, the proposed CSLA architecture is better in terms of PDP which leads the better utilization of the DSP processor.

#### VIII. Acknowledgment

The authors would like to thank Advanced VLSI Design Laboratory, IIT Kharagpur for their cooperation and support.

#### References

- [1] O. J. Bedrij, "Carry-select adder," IRE Trans. Electron. Comput., pp.340-344, 1962.
- [2] B. Ramkumar, H.M. Kittur, and P. M. Kannan, "ASIC implementation of modified faster carry save adder," Eur. J. Sci. Res., vol. 42, no. 1, pp. 53-58, 2010.
- T. Y. Ceiang and M. J. Hsiao, "Carry-select adder using single ripple carry adder," *Electron. Lett.*, vol. 34, no. 22, pp. 2101–2103, [3] Oct. 1998.
- Y. Kim and L.-S. Kim, "64-bit carry-select adder with reduced area," Electron. Lett., vol. 37, no. 10, pp. 614-615, May 2001. [4]
- J. M. Rabaey, Digtal Integrated Circuits-A Design Perspective. Upper Saddle River, NJ: Prentice-Hall, 2001. [5]
- Y. He, C. H. Chang, and J. Gu, "An area efficient 64-bit square root carry-select adder for lowpower applications," in Proc. IEEE [6] Int. Symp.Circuits Syst., 2005, vol. 4, pp. 4082–4085. Cadence, "Encounter user guide," Version 6.2.4, March 2008.
- [7]

Partha Mitra received his B.E.in Electrical Engineering and M.Tech in Electronics and Communication Engineering. His research interest include Lowpower circuits and systems and Digital Signal Processing. Debarshi Datta obtained his B.Tech and M.Tech in Electronics and Communication Engineering. His research interest include Lowpower circuit and system and Digital Signal Processing.