#### Microelectronic Engineering 88 (2011) 2781-2784

Contents lists available at ScienceDirect

## Microelectronic Engineering

journal homepage: www.elsevier.com/locate/mee

# Low power and high performance dynamic CMOS XOR/XNOR gate design

Jinhui Wang<sup>a</sup>,\*, Na Gong<sup>b</sup>, Ligang Hou<sup>a</sup>, Xiaohong Peng<sup>a</sup>, Shuqin Geng<sup>a</sup>, Wuchen Wu<sup>a</sup>

<sup>a</sup> VLSI and System Lab, Beijing University of Technology, Beijing, China<sup>b</sup> Department of Computer Science and Engineering, SUNY at Buffalo, Buffalo, NY, USA

#### ARTICLE INFO

Article history: Available online 17 February 2011

Keyword: Dynamic XOR/XNOR Gate Leakage Power Variation

## ABSTRACT

A hybrid network technique is proposed in dynamic CMOS XOR/XNOR gate to reduce the power consumption, save the layout area and avoid signal skew. Compared to the standard N type dynamic gate with similar delay time, the leakage power, dynamic power and layout area of the novel XOR/XNOR gate are reduced by up to 51%, 13% and 24%, respectively. Also, the inputs and clock signals combination static state dependent leakage characteristics of three dynamic CMOS XOR/XNOR gates are analyzed thoroughly. Finally, their robustness to noise, process and temperature variations are discussed. © 2011 Elsevier B.V. All rights reserved.

#### 1. Introduction

CMOS XOR/XNOR gates are fundamental units in various circuits especially circuits used for performing arithmetic operations in high speed microprocessor, such as adders, multipliers, and comparators [1–3]. They are a part of the critical path, thereby influencing the overall performance of the entire system [4].

Due to the complex PMOS network (Pull up network) and complementary inputs, traditional CMOS XOR and XNOR gates are characterized by high power consumption and low speed. Therefore, several new XOR and XNOR gates have been proposed based on different logic styles [4–8]. One design in [5] adopts the pass-transistor logic to reduce power consumption, but it has non-full voltage swing at the output and a limited driving capability. The other design in [5] employs a static CMOS inverter to improve the driving capability, but it introduces signal skew and results in extra layout area and power consumption. Although the design in [6] is proved to be power-efficient, its delay is too large to be utilized in high-speed circuits. More importantly, all of the gates in [5,6] fail to work properly when the supply voltage is scaled-down with the development of VLSI technology. A 10-transistor design is presented in [7] based on transmission gates to function at lower supply voltages, but it leads to large power consumption because of the presence of the three static inverters. The design in [8] achieves full-swing operation using only six transistors and thus it saves layout area. However, when the two inputs are "00" or "11", switching two cross-coupled transistors increases short circuit current and power consumption [4]. Two new XOR and XNOR gates are proposed in [4] to realize power-area-delay trade-off, but they fails to consider the influence

E-mail address: wangjinhui888@yahoo.com.cn (J. Wang).

of process and temperature variations. What's more, all of above designs are independent of clock signal and are unsuitable to be applied in pipeline and SRAM bit line circuits of the modern high performance microprocessors, because clock signal is required to control the running speed of the gates, and set the gates in standby state or evaluation state for gating technique.

Thus, dynamic CMOS XOR/XNOR gates (DXG) running with high frequency clock signal, due to the superior speed, small area, full voltage swing and satisfying driving capability, has been extensively applied in critical path design in microprocessor [1–4]. However, dynamic CMOS gates typically consume higher dynamic switching power, especially, with the rapid development of technology, the scaling of threshold voltage ( $V_{th}$ ) and gate oxide thickness ( $t_{ox}$ ) result in exponential increase of sub-threshold leakage and gate leakage power [15–17]. And the two standard DXGs – N type (DXGN) or P type (DXGP) structure – implement A  $\oplus$  B/A  $\odot$  B with inverter, which would inevitably add induce signal skew and propagation delay.

A novel DXG with hybrid network (DXGH), therefore, is proposed in this paper to achieve high power/speed/area efficient operation without input signal skew. In Section 2, the proposed structure is discussed in detail. In Section 3, the simulation results regarding leakage current, power, layout area, and noise margins are analyzed. Performance of the different DXGs under influence of the process parameter and temperature variations are shown in Section 4 and the conclusion is presented in Section 5.

#### 2. Proposed structure

Fig. 1 shows the structures of DXGN, DXGP and DXGH. As shown in Fig. 1a and b, DXGN and DXGP generate XOR and XNOR output signal utilizing the pull-down NMOS network (PDN) and the pull-up PMOS network (PUN), respectively. But DXGP has



<sup>\*</sup> Corresponding author. Tel.: +86 15001166864.

<sup>0167-9317/\$ -</sup> see front matter @ 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.mee.2011.02.068

superior leakage power characteristics as compared to DXGN. This is because the barrier height for the holes tunneling from the conduction band (ECB) is higher than that for the electrons tunneling from the valence band (HVB) and therefore PMOS transistors produces much less gate leakage current [10], which is shown in Table 1. But the evaluation speed of DXGN is higher than that of DXGP due to higher mobility of electrons than the holes. So DXGN consumes more power consumption but achieves higher speed, as compared to DXGP [9]. Obviously, there is a tradeoff between power consumption and delay time existing in dynamic XOR and XNOR gate. Therefore, DXGH with hybrid network is proposed to achieve this goal, which is shown in Fig. 1c. The hybrid network, based on the mixed application of PMOS transistor and NMOS transistor in PDN, is applied instead of the traditional PDN and PUN. The hybrid network consists of N1, P2, N3, and P4. It is operated as follows. When the clock is set low, Pc1 is turned on. The dvnamic node is charged to high by the precharge transistor Pc1. The output node is discharged to ground by Nr1. The evaluation phase begins when the clock is set high. Pc1 is cut off. Provided that the necessary inputs combination to discharge the evaluation node-(A = 0, B = 1) or (A = 1, B = 0)-is applied, the circuit evaluates and the dynamic node is discharged to ground, thereby generating low XNOR\_Out and high XOR\_Out. Otherwise, if (A = 0, B = 0) or (A = 1, B = 1) is applied, the high state of the dynamic node will be preserved until the following precharge phase, and XNOR\_Out maintains high and XOR\_Out maintains low. Because the inputs of DXGH do not include complementary signals, the static inverter and input signal skew are both cancelled, which would improve the power and layout area characteristics effectively.

#### 3. Simulation results

DXGN, DXGP, and DXGH are simulated respectively based on 45 nm BSIM4 models [11] by the HSPICE tool. In a worst case temperature of 110 °C at which the gate is dynamic and room temperature of 25 °C at which gate is static, each dynamic gate drives a capacitive load of 8 fF and is turned to operate at 1 GHz clock frequency to effectively compare the leakage power, the dynamic power, the layout area and the noise margins of different gates. The area of a gate is taken as the total transistor width of the circuit [12]. The threshold voltage of NMOS and PMOS transistors and supply voltage are set as 0.22, -0.22, and 0.8 V, respectively. To have a comparison, three different DXG gates are sized to have a similar worst-case propagation delay.

The simulated results in four input vectors with two clock states at 25 °C are listed in Table 2. It is shown that the optimal leakage current states of DXGN, DXGP and DXGH are (*CLK* = 0, input vector = (0, 0)), (*CLK* = 1, input vector = (0, 1)) and (*CLK* = 0, input

#### Table 1

Normalized leakage current of the devices at 25 °C.

| 45 nm Technology                                            | NMOS                | PMOS            |
|-------------------------------------------------------------|---------------------|-----------------|
| A: I <sub>leak</sub> (I <sub>sub</sub> ,I <sub>gate</sub> ) | 37.12 (19.59,17.53) | 16.56 (15.56,1) |
| B: I <sub>gate</sub>                                        | 46.79               | 1.56            |

Transistor: width = 1 µm, length = 45 nm.  $I_{leak}$ : Total leakage current.  $I_{gate}$ : Gate leakage current.  $I_{sub}$ : Sub-threshold leakage current.  $V_t = 0.22$  V.  $V_{dd} = 0.8$  V. A:  $V_{gs} = 0$  and  $|V_{ds}| = V_{dd}$ . B:  $|V_{gs}| = |V_{gd}| = |V_{gd}| = V_{dd}$ . Currents are normalized to the gate leakage current produced by PMOS Transistor in A state.

vector = (0, 0)), respectively, which can be explained as followed. As listed in Table 1, the leakage current of the off-NMOS transistor (37.12) is lower than that of the on-NMOS transistor (46.79), whereas the leakage current of off-PMOS transistor (16.56) is much higher than that of the on-PMOS transistor (1.56). In DXGH, when the clock signal is gated low with (0,0) input vector, PMOS transistors – P2 and P4 in the hybrid network and clock transistor Pc1 are turned on and NMOS transistors – N1 and N3 in the hybrid network and clock transistor Nc1 are cut off, thereby minimizing the total leakage current. It is the same case as DXGN. As to the DXGP, the optimal state is determined by stack effect. There exists a key difference between the state dependence of  $I_{sub}$  and  $I_{gate}$ :  $I_{sub}$  primarily depends of the number of on and off transistors in a stack, while *I*<sub>gate</sub> also depends strongly on the position of the on and off transistors. In the PUN of DXGP, any one of the four vectors would turn on two transistors, but only vector (0, 1) placing all off-transistors at the bottom of the stack, which suppresses Igate and the total leakage current [13].

In order to further investigate the performance of DXG, the noise margins of three different gates are quantified and listed in Table 3. The noise signal is assumed to be a 2.5 GHz square wave with 80% duty cycle. The maximum tolerable noise amplitude is defined as the signal amplitude at the inputs induced a 10%-V<sub>dd</sub> drop in the voltage at the output of DXG. As listed in Table 3, due to the absence of the protection of input buffer – static inverter, the noise margin of DXGH has a little loss (0.03 V) as compared to that of DXGN, but DXGH is much noise-immune than DXGP. Note that, the value of the noise margin of DXGH is still lager than 0.5 V (>1/2V<sub>dd</sub>). In practice, this noise immunity is sufficient to fight against most of noise influence.

Fig. 2 shows the comparison of the normalized dynamic power, leakage power at the optimal leakage current state and layout area of three gates. It can be seen that the dynamic power, the leakage power and the layout area of DXGH can be reduced by up to 13%, 51% and 24% as compared to the DXGN with similar delay time, respectively. This is because, on the one hand, eliminating static inverter results in non-skew input signals, less power consumption,



Fig. 1. (a) DXGN (b) DXGP (c) DXGH.

| Table 2                                                                                    |
|--------------------------------------------------------------------------------------------|
| Leakage current of three DXGs in the different input vector and clock states at 25 °C (A). |

| DXG<br>Clock |                | DXGN                      |                    | DXGP               |                           | DXGH                      |                    |
|--------------|----------------|---------------------------|--------------------|--------------------|---------------------------|---------------------------|--------------------|
|              |                | CLK = 0                   | CLK = 1            | CLK = 0            | CLK = 1                   | CLK = 0                   | CLK = 1            |
| Input vector | (0,0)<br>(0,1) | <b>1.54e-7</b><br>2.18e-7 | 2.67e-7<br>2.70e-7 | 4.14e-7<br>8.59e-7 | 4.11e-7<br><b>1.66e-7</b> | <b>7.51e-8</b><br>1.22e-7 | 1.76e-5<br>2.86e-7 |
|              | (1,0)          | 2.18e-7                   | 2.70e-7            | 7.27e-7            | 1.96e-7                   | 1.29e-7                   | 8.64e-5            |
|              | (1,1)          | 1.87e-7                   | 2.90e-7            | 4.37e-7            | 4.33e-7                   | 1.02e-7                   | 1.99e-7            |

#### Table 3

Noise margins of three XOR/XNOR gates.

| DXG           | DXGN   | DXGP  | DXGH   |
|---------------|--------|-------|--------|
| Noise margins | 0.68 V | 0.2 V | 0.65 V |



Fig. 2. Comparison of the normalized active power, lowest leakage power and layout area of three DXGs.

and smaller layout area; on the other hand, the hybrid network utilization trades off between the power consumption and speed which achieves high power/speed efficient operation. Also can be seen from Fig. 2, DXGP shows considerable power overhead compared to DXGN and DXGH. This is because PMOS transistors in PUN have poor speed characteristics and must be sized larger to achieve comparable speed, which increases power consumption.

#### 4. Process parameter and temperature variations

As the CMOS process advances continually, scaling has resulted in significant increase in the variations of the process parameters, including gate length (Lgate), channel doping concentration (Nch), and gate oxide thickness (tox). All of these process variations have a significant effect on the leakage current. Also, operating temperature influences the performance of DXG significantly. Changes in the operating temperature occur due to power dissipation in the form of heat. On-chip thermal variations have a significant bearing on the mobility of electrons and holes, as well as the threshold voltage of the devices, which result in variation of the leakage current and dynamic power. Therefore, in order to evaluate the impact of process and temperature variations on leakage current and dynamic power characteristics of DXG, multiple-parameter Monte Carlo analysis [14] is applied in this paper. In the simulation, L<sub>gate</sub>, N<sub>ch</sub> and t<sub>ox</sub> are all assumed to have normal Gaussian statistical distributions with a three sigma  $(3\sigma)$  fluctuation of 10% [18]. In addition, temperature is assumed to have uniform distribution and varies the normal value (75 °C) by ± 50 °C. 1000 Monte Carlo simulations are performed.

Fig. 3 shows the leakage current and the dynamic power distribution curves of the DXGN, DXGP, and DXGH under process and temperature variations. It can be seen that the leakage current dis-



Fig. 3. Leakage current and dynamic power distribution curves of the DXGN, DXGP, and DXGH under process parameter and temperature variations.

Table 4

Average and standard deviation of leakage current and dynamic power of different DXGs.

| DXG  | Leakage Current (uA) |         |         | Dynamic Power (uW) |        |        |
|------|----------------------|---------|---------|--------------------|--------|--------|
|      | DXGN                 | DXGH    | DXGP    | DXGN               | DXGH   | DXGP   |
| A/SD | 2.2/1.6              | 1.0/0.8 | 3.3/2.2 | 54/11              | 46/4.8 | 162/16 |

tribution curves of DXGH vs. DXGN and DXGH vs. DXGP cross at 0.54  $\mu$ A and 1  $\mu$ A, respectively. 53% of the samples of DXGH are lower than 0.54  $\mu$ A and 52% of the samples of DXGN are higher than 0.54  $\mu$ A. Alternatively, 71% of the samples of DXGH are lower than 1  $\mu$ A and 85% of the samples of DXGP are higher than 1  $\mu$ A. The dynamic power distribution curves of the DXGH vs. DXGN and DXGH vs. DXGP cross at 52 uW and 109  $\mu$ W, respectively. Eighty-seven percentages of the samples of DXGH are lower than 52 uW and 51% of the samples of DXGH are lower than 109  $\mu$ W and 51% of the samples of DXGH are lower than 109  $\mu$ W and 100% of the samples of DXGP are higher than 109  $\mu$ W. These results indicate that DXGH is preferable to reduce the leakage current and dynamic power in majority of the samples even under process and temperature variations, which is similar to the analysis at the normal design corner.

To further investigate the impact of the process and temperature variations on the leakage current and dynamic power of DXGs, we compare the average value (A) and standard deviation (SD) of the leakage current and the dynamic power in different DXGs. As listed in Table 4, DXGH reduces the average leakage current and dynamic power by up to 54% and 14%, respectively, as compared to DXGN. This is similar to the leakage currents and dynamic power reduction obtained at the nominal design corner (Fig. 2). Also can be seen form Table 4, all of the SD values of DXGH are less than that of DXGN and DXGP and therefore DXGH can effectively improve the robustness of DXG to the process and temperature variations.

#### 5. Conclusion

Standard dynamic CMOS XOR/XNOR gates are extensively employed in modern high performance microprocessors because of high speed and controllable evaluation by clock node, but they suffer from high power consumption and input signal skew. In this paper, a novel dynamic XOR/XNOR gate based on hybrid network is proposed to decrease, respectively, the leakage power, the dynamic power, and the layout area up to 51%, 13% and 24%, as compared to standard N type dynamic CMOS XOR/XNOR gate under similar delay time. What's more, the novel dynamic CMOS XOR/ XNOR gate shows superior robustness under process and temperature variations.

### Acknowledgment

This work is supported by the Startup Foundation for Doctors of BJUT (No. X0002013201103, No. X0002014201101, and No. X0002012200802) and the National Natural Science Foundation of China (No. 60976028).

#### References

 B. Stackhouse, S. Bhimji, C. Bostak, D. Bradley, B. Cherkauer, J. Desai, E. Francom, M. Gowan, P. Gronowski, D. Krueger, C. Morganti, IEEE Journal of Solid-State Circuits 44 (2009) 18–31.

- [2] H. Kaul, M.A. Anders, S.K. Mathew, S.K. Hsu, A. Agarwal, R.K. Krishnamurthy, S. Borkar, IEEE Journal of Solid-State Circuits 44 (2009) 107–114.
- [3] Ying.-Haw. Shu, S. Tenqchen, Ming.-Chang. Sun, Wu.-Shiung. Feng, IEEE Transactions on Circuits and Systems 53 (2006) 138–142.
- [4] Goel. Sumeer et al., IEEE Transactions on Circuits and Systems 53 (2006) 867– 877.
- [5] J.M. Wang, S.C. Fang, W.S. Feng, IEEE Journal of Solid-State Circuits 29 (1994) 780–786.
- [6] Bui H. T., Al-Sheraidah A. K., and Wang Y., "New 4-transistor XOR and XNOR designs," 2nd IEEE Asia Pacific Conference ASICs, Shilla Cheju (2000) 25–28.
- [7] A.M. Shams, T.K. Darwish, M.A. Bayoumi, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 10 (2002) 20–29.
- [8] D. Radhakrishnan, IEE Circuits Devices Syst. 148 (2001).
- [9] Jinhui Wang et al., 10th international conference on ulimate integration of silicon (2009) 225–228.
- [10] Yee Chia Yeo et al., IEEE Electron Device Letters 21 (2000) 540-542.
- [11] PTM, http://www.eas.asu.edu/~ptm.
- [12] Ge Yang, et al. "Low power and high performance circuit techniques for high fan-in dynamic gates", 5th International Symposium on Quality Electronic Design, San Jose, (2004) 421–424.
- [13] Dongwoo. Lee, D. Blaauw, D. Sylvester, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12 (2004) 155–166.
- [14] Jinhui Wang et al., Journal of Semiconductors 30 (2009). 125010-1-125010-5.
- [15] M.T. Bohr, IEEE Transactions on Nanotechnology 1 (2002) 56-62.
- [16] Na Gong et al., Microelectronics Journal 39 (2008) 1149–1155.
- [17] V. Kursun, G. Friedman Eby, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12 (2004) 485–496.
- [18] Semiconductor Industry Association, International Technology Roadmap for Semiconductors, (2009 Edition) [Online]. Available: http://www.itrs.net/Links/ 2009ITRS/Home2009.htm.