# Power Efficient Domino Memory Design

Janani.S

Department of ECE Sri Ramakrishna Engineering College Coimbatore, India

**Abstract:** The processors form the main part in all major electronic components in the digitized world. The memory inside the processor consumes more power. The memory of the processor is a register file architecture. The power dissipation of the register file is mainly caused due to Local and Global Bit line circuits. Domino logics are the main components considered in the design of the local and global bit lines. The domino logic circuit suffers mainly due to increased power dissipation because of leakage current in the evaluation network and also due to current contention because of keeper upsizing. In order to avoid such problems the current comparison domino logic is used. Further the performance of the current comparison based domino logic can be increased by including Clock gating concept. So by this unwanted switching can be avoided at times where clock signal is not needed. This fairly reduces the power dissipation. By designing the Local and Global bitlines with this type of domino logic, the performance of the register files can be increased with reduction in power dissipation. Thus the microprocessors and DSPs would have better efficiency.

Keywords: Domino logic, bit lines, register files, clock gating, current comparison.

### I. Introduction

A domino logic consists of an 'n' type dynamic logic block followed by static inverter for cascading of dynamic gates. There are two phases of working. They are the precharge phase and evaluation phase. During precharge phase, the output of the n type dynamic gate is charged upto VDD and the output of the inverter is set to Q. During evaluation phase, the dynamic gate conditionally discharges and the output of the inverter makes a conditional transition from  $0\rightarrow 1$ . The introduction of static inverter has the additional advantage that fanout of gate is driven by static inverter with low impedance output which increases noise immunity. Also it reduces the capacitance of dynamic output node. Since each dynamic gate has static inverter, only non-inverting logic can be implemented. This is a major limiting factor. Apart from this disadvantage very high speed can be achieved. But while implementing the wide fan-in logic gates using this dynamic logic technology, there are certain disadvantages like high capacitance on the dynamic node, leakage current due to evaluation phase, high current contention due to upsizing and so on. In order to avoid the above said disadvantages many logics where proposed and now a new domino logic is proposed which has low leakage without dramatic speed degradation for wide fan-in gates. This technique utilizes the concept of comparison of current and consequently power consumption and delay

A register file consists of an array of SRAM-based registers, with write ports, and read ports. These ports are mainly realized by utilizing multiplexer and de multiplexer circuits which are typically implemented by OR and NOT gates. Therefore, wide fan-in OR gates are one of the most important building blocks for the implementation of high-performance modules. However, in wide fan-in dynamic gates especially wide fan-in OR gates, robustness and performance significantly degrade with increasing leakage current. Increasing the switching capacitance due to the use of more transistors causes a significant increase in energy consumption. Therefore, the power dissipation of memory structures such as register files will increase significantly. The proposed 8\*8 register file memory design with 2-Read and 1-Write ports works in such a way that it reduces power dissipation by using Current comparison based domino logic based Local and Global bit lines (LBL & GBL) of the register file architecture. The LBL and GBL structures have wide fan-in OR gates which consume more power. So by implementing current comparison based domino logic which reduces power dissipation in wide fan-in OR gates the register file architecture consumes less power. Also a Clock gating logic is also added to the current comparison based domino logic for further power dissipation reduction.

### II. Literature Review

J.M Rabey et all., proposed the basic domino logic model called as the standard footless domino logic (SFLD) [15]. This is the most popular dynamic logic and the conventional one. Here a PMOS keeper transistor is employed to prevent any undesired charging and discharging at the dynamic node due to leakage currents and charge sharing of the pull down network during the evaluation phase, hence improving robustness. However keeper upsizing increases current contention between the keeper transistor and evaluation network increasing power consumption and evaluation delay of standard domino circuits.M.H. Anis, M.W. Allam and M.I

Elmsary, proposed a Logic named as High Speed Domino(HS domino) [2]. Introduction of clock delay can be used to reduce the current drawn through the pmos keeper and the nmos pulldown network. This helps in keeping large pmos keeper without performance degradation and leakage current. However the area and power overhead of the clock delay circuit will be still there. A. Alvandpour, R. Krishnamurthy, K. Sourrty, and S. Y. Borkar, proposed conditional keeper domino logic (CKD) [3]. This consists of small and large keeper transistors. The conditional keeper domino has certain disadvantages such as limitations on increasing the delay and power dissipation due to upsizing. Y. Lih, N. Tzartzanis, and W. W. Walker proposed a leakage current replica keeper (LCR) [9] for dynamic circuits to improve scaling of dynamic gates. Overhead is 1-FET per gate plus a portion of the replica circuit. For equal noise margins, more legs are possible. Gate is faster with the same number of gates. A fairly large safety factor is needed to account for random on-die process variation especially FET Vt variation. H. Mahmoodi and K. Roy proposed the Diode-footed domino (DFD) [10]. A diode footer in series is used with evaluation network which increases robustness and noise immunity. But the main drawbacks are that though the leakage current through evaluation pull down network is reduced, the current through footer is again increased. Discharge of dynamic node is not as fast as previous techniques. H. Suzuki, C. H. Kim, and K. Roy proposed the Diode partitioned Domino (DPD) [16]. The clock frequency and physical address space of 64-bit microprocessors continue to grow, one major critical path is the access to the on-die cache memory that includes a tag comparator, a tag SRAM and a data SRAM. To improve the delay of the tag comparator, a diodepartitioned (DP) domino circuit is proposed. DP domino reduces the parasitic capacitance and enables a smaller keeper in high fan-in gates. The diode circuit is also improved by an enhanced diode that boosts up the gate voltage of the nMOS diode. Yet suffers from power dissipation value being little greater. Ali peravi and Mohammed Asayei proposed Robust Low Leakage Conditional keeper based Current Comparison domino (CKCCD) [13], an advanced domino logic methodology which had better performance and increased robustness compared to all other domino designs. A. Alvandpur, K. Krishnamoorthy and Ganesh sowmyanath proposed a 130 nanometer 16 GHz 256-word 32-bit leakage tolerant RAM design [8] which operated in high frequency range with standard footless domino logic based Local and Global bit line designs for the register file. A. Agarwal, S. Mathew et all., proposed a 32 nanometer 8.3 GHz 64-entry 32-bit variation tolerant near threshold voltage register file [1] implementing the standard footless domino logic for the Local and Global Bit lines of the register file. This suffers due to increased power dissipation and reduced noise immunity. Ali Peravi and Mohammed Asayei proposed 64-word 32-bit register file architecture [14] with Conditional Keeper based Current Comparison Domino (CKCCD) logic circuit for the Local and Global Bit lines for the register file architecture.

### III. Concept Of CCD Logic & Clock Gated CCD Logic

Since in wide fan-in gates, the capacitance of the node is large, speed is decreased dramatically. In addition, noise immunity of the gate is reduced due to many parallel leaky paths in wide fan in gates. Although upsizing of the keeper transistor can improve noise robustness, power consumption and delay are increased due to large contention. These problems can be solved if pull down network implements the logic function, is separated from the keeper transistor by using comparison stage in which the current of pull up network is compared with worst case leakage current. Normally in domino logic, in the Pull up network, only one pmos transistor instead of 'n' transistor is used (n=4, 8, 16 etc.,) and the logic will be implemented with the pull down network. By reducing the number of transistors, capacitance decreases. Few transistors means limited switching (charging and discharging) of the capacitor. So dynamic power consumption gets decreased as we know main source for dynamic power dissipation is capacitance charging and discharging. As known already leakage current occurs due to unwanted current flow between source and the drain of a transistor. This leakage current can be avoided by "stacking effect". Stacking effect says that when two or more transistors in series are at off condition, the leakage current can be reduced. Here concept of keeper transistor comes in. Keeper transistor is used so that this transistor supplies a small amount of current from the power-supply network to the dynamic node of a gate so that charge stored in dynamic node is preserved for the necessary situations. For better robustness upsizing of the keeper is done but the problem is that during evaluation phase, when the pull down network is on, contention arises. The problem can be avoided by temporarily disabling the keeper at particular times when dynamic gate switches. The CCD is shown in the Fig 1. In the proposed circuit current of the pull up network is mirrored by transistor M2 and compared with reference current, which replicates the leakage current of the pull up network. The circuit is implemented with nmos circuit to implement the logic function. The source and body terminals of pmos transistors are tied together so that body effect is eliminated. The transistor M1 is in diode configuration. This means the drain and the gate of the M1 transistor is tied together. This helps in decreasing the leakage current reduction by using the concept of stacking effect when all inputs of the OR gate is set to low level or in standby mode. Addition of M1 results in leakage reduction of sub threshold leakage of the evaluation network due to stacking effect. There are two phases of CCD Logic in the active mode. During precharge phase, the clock input is held low. During this phase, the leakage current is reduced due to M1, since

it is in diode configuration. So Min volt = Vgs = Vds = Vth. Also due to Stacking effect leakage current gets reduced. During evaluation phase clock input is pulled high. Since atleast one input is on conduction path exists to ground since in pull down network the nMOS transistors are used. Since M1 is on, current flow in Pull up network is high. This is mirrored by M2. So the dynamic node voltage is discharged causing the keeper transistor to turn off decreasing the current contention problem. So by the above said methods, ultimately the power dissipation gets reduced using CCD logic while compared to other method. Clock gating is a popular technique used in many synchronous circuits for reducing dynamic power dissipation. Clock gating saves power by adding more logic to a circuit to prune the clock tree. Pruning the clock disables portions of the circuitry so that the flip-flops in them do not have to switch states. The clock is gated by performing AND operation with a control signal, which is referred as Clock gate signal. When the latch is not required to switch state, Clock gate signal is turned off and the clock is not allowed to charge or discharge Cg, saving clock power. Because the AND gate's capacitance itself is much smaller than Cg there is a net power dissipation can further be reduced. The Fig 2 shows the idea of clock gated CCD logic. Only when both the enable and the clock signal are high the clock signal is routed to the transistors of the CCD logic and the corresponding operations are performed.

#### IV. Clock Gated Register File Memory Design

A register file is an array of registers. Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multi ported SRAMs will usually read and write through the same ports. In Register file bit cell, a read port on each side of the storage cell is inserted to provide symmetric loading during cell write for optimal stability. Demand for complement of the input data is removed by using an extra NMOS pass transistor. In the Fig 3, P1 denotes the port 1 and P0 denotes the port 0. WS denotes write select, RS denotes read select and Din denotes data input line. The Local and Global bit lines (LBL & GBL) are used for accessing the data from the particular memory location. Each local bit line is selectively coupled to an associated global bit line. During times when a selected memory is being read, the switching device associated with the LBL is switched opened so that the LBL is electrically isolated from its associated GBL, and a read voltage is then applied across the selected memory element. The applied read voltage causes current to flow through the selected memory. The local bit line voltage depends on the memory state of the selected memory element and is amplified by the gain stage and conducted along the GBL that is associated with the LBL. The amplified current, or other related signal on the GBL, determines the stored memory state of the selected memory element. The bit line consumes a major portion of the dynamic power nearly 70% in the register files, and becomes the dominant factor in their energy breakdown. The power dissipation of the bit lines will be increased linearly by larger number of registers and higher number of ports. On the other hand, the leakage power becomes a significant source of power consumption as the technology scales down even up to 50% in the 90nm technology. Thus, reduction of bit line power consumption can reduce overall power consumption of register files and consequently total power of current microprocessors. To improve performance and robustness, several circuits such as LCR, Comparison based dual rail logic (CKD) and Controlled Keeper by Current Comparison domino logic (CKCCD) based LBL & GBL were employed. Yet they all suffered from one disadvantage or the other like reduced robustness and so on. To overcome these disadvantages Current Comparison based domino logic is proposed. So LBL and GBL architecture is designed using the current comparison based domino logic. The Fig 4 and Fig 5 shows the CCD based LBL and GBL architecture. Each read port needs a LBL which forms a dynamic 8-way AND-OR. During read cycle, data from the storage cell is read by two transistors per word (M1 and M2) on each LBL. The GBL circuit is dynamic8-inputORgates.









Fig 3 Register File Bit Cell





Fig 5 CCD Based GBL

#### V. Simulation Results

The Current Comparison Domino Logic based OR gate is designed for wide fan-in of 4,8 and 16 inputs. The power dissipation, area and delay are found. The Figure 6 shows the power dissipation result of 4 input CCD logic.

Log X\_MAMREF\_SCH1.N\$45 4.2092 X\_MAMREF\_SCH1.N\$55 4.6000 X\_MAMREF\_SCH1.N\$80 3.6860 TOTAL POWER DISSIPATION: 1.68770 WATTS Eldo NEWTON: VNTOL=1.000000e-06 RELTOL=1.000000e-03 Fig 6 Power Dissipation of 4 input CCD logic

The power dissipation value is found to be 1.6877 micro watts. The Fig 7 shows the power dissipation of 8 input CCD logic.

| Log                |           |           |       |
|--------------------|-----------|-----------|-------|
| A_CCDOIF_DCHI.N#/  | 5.0000    |           |       |
| X_CCD8IP_SCH1.N\$8 | 243.7458M |           |       |
| TOTAL POWER DIS:   | SIPATION: | 842.2409U | WATTS |

#### Fig 7 Power Dissipation of 8 input CCD logic

The power dissipation is found to be 842.2409 micro watts. The Fig 8 shows the power dissipation value of 16 input CCD logic.

| Log                                                 |
|-----------------------------------------------------|
| X_CCDNEW16_SCH1.N\$4 10.3806                        |
| TOTAL POWER DISSIPATION: 880.77070 WATTS            |
| Eldo NEWTON: VNTOL=1.000000e-06 RELTOL=1.000000e-03 |
| Fig 8 Power Dissipation of 16 input CCD logic       |

The power dissipation result is 880.770 microwatts The clock gated CCD OR gate logic is simulated with wide fan-in of 4, 8 and 16 inputs. The power dissipation, area and delay is also found out. The Fig 9 shows the power dissipation result of 4 input clock gated CCD logic.



Fig 9 Power Dissipation of 4 input Clock Gated CCD logic

The power dissipation value is found to be 41.115 nanowatts. The Fig 10 shows the power dissipation result of 8 input Clock gated CCD logic.



The power dissipation value is found to be 66.43  $\mu wats.$  The Fig 11 shows the power dissipation result of 16 input Clock gated CCD logic.

| Log                                    |       |
|----------------------------------------|-------|
| A_CONTRACTOR SCHEME // 1.5055M         |       |
| X_CCD16INPUTNEWDONE_SCH1.N\$80 3.8279M |       |
| X_CCD16INPUTNEWDONE_SCH1.N\$92 0.0000  |       |
|                                        |       |
| TOTAL POWER DISSIPATION: 645.0550U     | WATTS |
|                                        |       |

# Fig 11 Power Dissipation of 16 input Clock Gated CCD logic

The delay, noise and area values are also calculated for finding out the Figure of Merit (FOM). FOM is the quantity that is used to characterize the performance of a device or a system or a method. Unity Noise Gain is also taken into account for calculating the FOM. UNG is the input noise amplitude that causes the same noise voltage to occur at the output. The formula for calculating FOM is given below :

# $(UNG)/(P_{norm} * D_{norm} * \sqrt{D_{norm}} * A_{norn})$

#### Table 1 Power dissipation comparison between CCD logic and Clock gated CCD logic

| No.of.inputs | Power dissipation of | Power dissipation of Clock gated CCD logic |
|--------------|----------------------|--------------------------------------------|
| 4 input      | 1.6887 µwatts        | 41.11 nwatts                               |
| 8 input      | 842.82 µwatts        | 66.43 nwatts                               |
| 16 input     | 880.17 µwatts        | 645.05 nwatts                              |

From the table 1 it is clear that Dissipation of Clock gated CCD logic is lesser than that of normal CCD logic. The table 2 shows the comparison of input and output noise of Clock gated CCD logic.

| Table 2 Input and Output hoise comparison of Clock gated CCD logic |                       |                                              |  |  |  |
|--------------------------------------------------------------------|-----------------------|----------------------------------------------|--|--|--|
| No.of.inputs                                                       | Noise at the input of | Noise at the output of Clock Gated CCD logic |  |  |  |
|                                                                    | Clock Gated CCD logic |                                              |  |  |  |
| 4 input                                                            | 40.118 volts          | 39.550 volts                                 |  |  |  |
| 8 input                                                            | 59.198 volts          | 48.821 volts                                 |  |  |  |
| 16 input                                                           | 47.572 volts          | 43.626 volts                                 |  |  |  |

### Table 2 Input and Output noise comparison of Clack gated CCD logic

From the table it is inferred that the output noise voltage reduces for the Clock Gated CCD logic while compared to its input. The table 3 shows the delay comparison between the CCD logic and the Clock Gated CCD logic.

| Table 5 Delay                   | Table 5 Delay comparison of the CCD logic and the Clock Galeu CCD logic |                       |  |  |  |  |
|---------------------------------|-------------------------------------------------------------------------|-----------------------|--|--|--|--|
| No.of.inputs Delay of CCD logic |                                                                         | Delay of              |  |  |  |  |
| _                               |                                                                         | Clock Gated CCD logic |  |  |  |  |
| 4 input                         | 8.377 nanoseconds                                                       | 20.00 nanoseconds     |  |  |  |  |
| 8 input                         | 8.410 nanoseconds                                                       | 27.26 nanoseconds     |  |  |  |  |
| 16 input                        | 8.4202 nanoseconds                                                      | 20.00 nanoseconds     |  |  |  |  |

## Table 3 Delay comparison of the CCD logic and the Clock Cated CCD logic

From the table the delay of the Clock gated CCD logic is little higher than that of CCD logic. The table 4 shows the area comparison between CCD and Clock gated CCD logic

| Table 4 | Area comparison of the CO | CD logic and the Clock Gated CCD logic |
|---------|---------------------------|----------------------------------------|
|         | Area of CCD logic         | Area of                                |

| No.of.inputs | Area of CCD logic | Area of               |
|--------------|-------------------|-----------------------|
|              |                   | Clock Gated CCD logic |
| 4 input      | 410 µmetre        | 456 µmetre            |
| 8 input      | 528 µmetre        | 611 µmetre            |
| 16 input     | 672 μmetre        | 986 µmetre            |

The area of the Clock gated CCD logic is little higher than that of CCD logic. The table 5 shows overall performance comparison of CCD and Clock Gated CCD logic.

| MEASURED<br>PARAMETES | CCD<br>n = 4 | CCD<br>n = 8 | CCD<br>n =16 | Clock Gated CCD<br>n = 4 | Clock Gated CCD<br>n = 8 | Clock Gated CCD<br>n = 16 |
|-----------------------|--------------|--------------|--------------|--------------------------|--------------------------|---------------------------|
| UNG                   | 1            | 1            | 1            | 1.001                    | 1.116                    | 1.18                      |
| NORMALISD<br>POWER    | 1            | 1            | 1            | 0.24                     | 0.07                     | 0.73                      |

#### Table 5 Performance comparison of CCD and Clock Gated CCD logic

| NORMALISD<br>AREA                          | 1 | 1 | 1 | 1.11 | 1.15 | 1.46 |
|--------------------------------------------|---|---|---|------|------|------|
| NORMALISD<br>DELAY                         | 1 | 1 | 1 | 2.38 | 3.23 | 2.38 |
| $\frac{\text{NORMALISD}}{\sqrt{D_{norm}}}$ | 1 | 1 | 1 | 1.54 | 1.79 | 1.54 |
| FOM                                        | 1 | 1 | 1 | 1.02 | 2.38 | 3.06 |

From the table it is clear that FOM for Clock Gated Domino logic is higher than that of normal CCD logic. So performance is better for the former logic than the later. The Local Bit Line and Global Bit Line are designed using the Standard Footless Domino Logic and Current Comparison Based Domino Logics. The Fig 12 shows power dissipation result of CCD based LBL. The Fig 13 shows power dissipation result of CCD based GBL.

| Log               |                       |               |       |
|-------------------|-----------------------|---------------|-------|
| v_urwment.ude>    | 0.0000                |               |       |
| X_NEWLBL1.N\$7    | 0.0000                |               |       |
| X_NEWLBL1.N\$8    | 0.0000                |               |       |
| TOTAL POWER       | DISSIPATION:          | 123.1423N     | WATTS |
| <b>Fig 12 P</b> o | ower Dissipation of C | CD based LBL. |       |

The power dissipation of CCD based LBL is 123.14 nano watts. The power dissipation of CCD based GBL is 42.74 nano watts.

| Log             |              |          |       |
|-----------------|--------------|----------|-------|
| A_copopulation  | 4101054511   |          |       |
| X_CCDGBL1.N\$33 | 894.2533U    |          |       |
| X_CCDGBL1.N\$41 | 794.8968U    |          |       |
|                 |              |          |       |
| TOTAL POWER     | DISSIPATION: | 42.7402N | WATTS |

Fig 13 Power Dissipation of CCD based GBL

The Table 6 shows the power dissipation comparison between SFLD based LBL and GBL and CCD based LBL and GBL.

| <b>Fable 6 Power</b> | Dissipation | analysis of | SFLD and | <b>CCD</b> base | d LBL and GBL |
|----------------------|-------------|-------------|----------|-----------------|---------------|
|                      |             |             |          |                 |               |

| Logic | Power Dissipation of LBL | Power Dissipation of GBL |
|-------|--------------------------|--------------------------|
| SFLD  | 2.107 µwatts             | 2.86 µwatts              |
| CCD   | 123.14 µwatts            | 11.57 µwatts             |

From the table it is clear that the power dissipated by CCD based LBL and GBL is comparatively lesser than that of SFLD based LBL and GBL. The power dissipation value is found to be 108.656 milliwatts. The Fig 14 shows the power dissipation of CCD based register file memory design.

Log TOTAL POWER DISSIPATION: 46.7129M WATTS Eldo NEWTON: VNTOL=1.000000e-06 RELTOL=1.000000e-03 Compute from 0.000000 Nano to 1.000000E+03 Nano Fig 14 Power Dissipation of CCD based Register File Memory Design The power dissipation value is found to be 46.712 milli watts which is lesser than that of SFLD based design. The Fig 15 shows the power dissipation of Clock Gated CCD based register file memory design.



#### Register File Memory Design.

The power dissipation value is 294.373 microwatts which is very lesser than CCD based design as well as SFLD based Design. The table 7 shows the power dissipation comparison between all the three register file memory designs.

| Table 7 | 7 Power | Dissipation | Comparison | of different | t Register | File Memory   | Designs |
|---------|---------|-------------|------------|--------------|------------|---------------|---------|
| I abic  | I Uncl  | Dissipation | Comparison | or unrerent  | i negister | r ne menior y | Designs |

| Logic                                             | Power dissipation |
|---------------------------------------------------|-------------------|
| SFLD based Register file memory design            | 108.656 mWatts    |
| CCD based Register file memory design             | 46.712 mWatts     |
| Clock Gated CCD based Register file memory design | 294.66 µwatts     |

Thus the Clock Gated Domino Logic based memory design is found to have lowest power dissipation compared to the other two designs. The 16 shows the comparison chart between CCD and Clock gated CCD based Register File Memory Design in terms of normalized power dissipation. The power dissipation is normalized with respect to SFLD based memory. The Fig 17 shows the chart depicting the number of components in SFLD based memory, CCD based Memory and Clock Gated CCD based Memory. Area is denoted by number of components.



#### Fig 16 Normalized Power dissipation Comparison Chart



#### Fig 17 Number of Components-comparison chart

From the chart it is evident that the Clock Gating CCD based register File has higher number of components than the other two models. So it has high area overhead.

#### VI. Conclusion

The Clock Gating concept is included to the normal Current Comparison Domino logic. The power dissipation values, area and Figure Of Merit are calculated and compared for the two. The power dissipation of 4, 8 and 16 input Clock gated CCD logic is found to be 41.11 nw, 66.430 µw and 645.055 µw which is very less compared to CCD logic which has power dissipation values of 1.68 µw, 842.82 µw and 880.17 µw for the corresponding Fan-in of 4,8 and 16. Register File memory is designed with LBL and GBL architectures designed using SFLS, CCD and Clock Gated CCD logics and the power dissipation for them are 108.656 mw, 46.712 mw and 294.66 µw. So it is evident that register file memory designed using Clock gated CCD based LBL and GBL has very low power dissipation while comparing with the other two designs. The area of the clock gating based CCD logic for wide fan-in of 4, 8 and 16 is 456 µm, 611 µm and 986 µm which are higher than area of CCD logic for 4, 8 and 16 inputs. They are 410 µm, 528 µm and 672 µm correspondingly. The FOM of CCD logic is normalized to 1 for 4, 8 and 16 inputs and the FOM of Clock gated CCD logic is 1.02, 2.38, 3.06 for the wide fan-in of 4,8 and 16 inputs. So it is evident that performance of clock gated CCD based design is better than SFLD and CCD based designs. The register files of microprocessors, digital signal processors, cache memories etc suffer from higher power dissipation due to wide fan-in logic gates. To reduce this domino logic circuits were developed. The Current comparison based domino logic proves to be power efficient. By incorporating clock gating technique with the CCD logic power dissipation gets very much reduced. By designing the local bit line and global bit line circuits for the register files by using the CCD logic and Clock gated Domino logic technique the power dissipation of the register file memory architecture is reduced while comparing with that of basic SFLD based design. While comparing CCD and Clock Gated CCD logic, the later shows better performance with reduced power dissipation and increased Figure of merit. So in TSMC 180 nm technology, the register file memory design with Clock gated CCD logic based LBL and GBL circuits can operate reliably under low power.

#### References

- [1]. Agarwal. A, Hsu. S et all., "A 32-nm 8.3 GHz 64-entry x 32-bit variation tolerant near threshold register file", Proceedings of symposium on VLSI Circuits (VLSIC) technical digest of technical papers, 2010, pp. 105-106.
- Anis. M, Allam. M. W, Elmsary, "Energy Efficient noise tolerant technique for scaled down CMOS and MTCMOS technologies", [2]. IEEE Trans, VLSI Syst., vol. 10, no.2, pp. 71-78, Apr 2002. Alvandpur. A, Krishnamoorthy. K et all., "A sub 130-nm conditional keeper technique", IEEE Journal of Solid State Circuits 37
- [3]. (2002) 633-638
- [4]. Bowman. K, Duval. S. G et all., "Impact of die-to-die and within die parameter fluctuations on maximum clock frequency distribution for gigascale integration", IEEE Journal of Solid State circuits, vol. 37, no.2, pp.183-190, Feb 2002.
- [5]. David Jeyasingh. R. G. Bhatt. N and Amrutur. B, "Adaptive Keeper design for dynamic logic circuits using rate Sensing technique", IEEE Transactions on Very Large Scale Integration (VLSI) Syst., vol. 19, no.2, pp.295-304, Feb 2011.
- [6]. Guan. X, Fei. Y, "Register File Partitioning and Compiler Support for reducing embedded processor power consumption", IEEE Transactions on Very Large Scale Integration (VLSI) systems 18 (2010), 1248-1252.
- Kim. C. H. Roy. K, "A process variation compensating technique with On-Die leakage current sensor for nanometer scale dynamic [7]. circuits", IEEE Transactions on Very Large Scale Integration (VLSI) Syst., vol 14, no.6, pp.646-649, June 2006.
- Krishnamoorthy. K et all., "A 130-nm 6-GHz 256-word 32-bit leakage tolerant register file RAM", IEEE Journal of Solid State [8]. Circuits 37 (2002) 624-632.
- [9]. Lih. Y, Tzartzanis. N and Walker. W. W, "A leakage tolerant replica keeper for dynamic circuits", IEEE Journal of Solid State circuits, vol.42, no.1, pp.71-78, April 2007.
- [10]. Mahmoodi. H, Roy. K, "Diode Footed Domino: A leakage tolerant high fan-in dynamic circuit design style", IEEE Transaction. Circuits Syst. 1, Reg. Papers, vol.51, no.3, pp.495-5-3, March 2004.
- Muller. M, Simon. S et all., "Low Power Synthesizable Register Files for modern efficient processors", Integration, the VLSI [11]. journal, vol 39, 2006.
- Mostafa. H, Anis. M and Elmsary. M, "Novel time yield improvement circuits for wide fan-in gates", IEEE Trans. Circuits Syst. 1, [12]. Reg. Papers, Vol.58, no.10, pp.1785-1797, August 2011.
- Mohammed. M, Peravi. A, "Robust low leakage controlled keeper by current comparison domino for wide fan-in gate", Integration, [13]. VLSI journal, vol.45, no.1, pp.22-32, 2012.
- [14]. Mohammed. M, Peravi. A, "Low power wide gates for modern efficient processors", Integration, the VLSI journal (2013).
- Rabey. J. M, Chandrasekar. A, Nicolic. B, "Digital Integrated Circuits- A Design Perspective", 2nd ed. Upper Saddle River, NJ: [15]. Prentice Hall, 2000.
- Suzuki. H, Kim. C. H and Roy. K, "Fast tag comparator using diode partitioned domino for 64-bit microprocessors", IEEE Trans. [16]. Circuits Syst., vol.54, no.2, pp.322-328, Feb 2007.
- Wang. L, Krishnamoorthy. K and Soumyanath. K, "An energy efficient leakage tolerant dynamic circuit design style", in proc. Int. [17]. ASIC/SOC Conf., 2003.