# Power and Delay Estimation for Dynamic OR Gates with Header and Footer Transistor Based on Wavelet Neural Networks

Jinhui Wang, Wuchen Wu, Zuo Lei, Ligang hou, Xiaohong Peng, Daming Gao VLSI & System Lab Beijing University of Technology Beijing 100022, China wangjinhui888@yahoo.com.cn

Abstract-A system for estimating the leakage power, the active power and the delay of the domino OR gates with the sleep transistor based on wavelet neural networks in 45 nm technology is proposed. By studying the impact of the power gating technique (PGT) on the power and delay characteristics, the proposed model could estimate the nonlinear changing of the active power, the leakage power and the delay of the different inputs dynamic OR gates with fast speed convergence and high precision. The trend of the estimating curve is discussed. At last, the comparison between the footer and the header sleep transistor technique is given.

#### I. INTRODUCTION

The dynamic CMOS circuits have been extensively applied in modern high performance microprocessors because of the superior speed and area characteristics as compared to static CMOS circuits [1]-[3]. Especially, wide dynamic OR gates or like structures are commonly employed in register and cache array bit lines design and its performance determines the performance of the whole system [4], [5]. However, as the technology aggressive downscales, the dynamic OR gates typically consume higher leakage power as compared to static CMOS gates. The 2005 International Technology Roadmap for Semiconductors (ITRS) [6] predicted that by the sub-65nm generation, leakage may constitute as much as 50 percent of the total power consumption.

Therefore, there exists the need to find an effective solution to suppress the leakage power consumption. The power gating technique (PGT) [7] [8] is one of the most popular techniques to achieve this goal. However, due to the additional parasitic capacitance introduced by the sleep transistor and the reduced supply swing, the active power and the delay of the OR gates will be increase, respectively. Hence, before applying PGT, the reduction of leakage power and the penalty of the delay and active power should be estimated and then traded off, which could help judge if the application of PGT in dynamic gates meets the design constrains of the power and the delay. Especially in EDA design flow, this estimation will reduce iteration and save designers a huge amount of time. However, the power and delay estimating for dynamic gates is challenging because of the nonlinear effects. Neural networks have emerged to provide a very appealing approach to estimate the nonlinear changing of the power and delay without explicit models. Due to quick convergence, effective classification and high accuracy, the wavelet neural

Na Gong College of Electronic and Informational Engineering Hebei University Baoding 071002, China gongna\_china@yahoo.com.cn

networks (WNN) has been widely known and successfully used in most control systems and information processing systems[9][10]. In this paper, a novel approach for estimating the power and delay of OR gate with sleep transistor based on WNN in 45 nm technology is proposed and its accuracy is validated with simulation test.

#### II. PERFORMANCE ESTIMATION FOR DYNAMIC POWER GATING CIRCUITS

#### A. Dynamic Power Gating Circuits

The standard dynamic OR gate is shown in Fig.1(a). In the precharge phase, clock is set low. P1 is turned on. Dynamic node is charged to Vdd by the precharge transistor. Evaluation phase begins when the clock signal is set high. P1 is cut-off. Provided that the necessary input combination to discharge the evaluation node is applied, the circuit evaluates and the dynamic node is discharged to ground. Otherwise, the high state of the dynamic node will be preserved by keeper P2 until the following precharge phase.

Leakage power gating is realized by using a current switch, which can be either nMOSFET footer or header, as shown in Fig. 1 (b) (c). In idle state, a sleep signal is applied to the gate of the current switch to disconnect Vdd or ground from a logic block. Alternatively, while circuit is working, the sleep signal is de-asserted to connect the logic block to the power rail.

Fig.1 (b) shows a power gating circuit with footer. When the circuit is in active (the footer is turned on, sleep=1), the voltage of virtual ground is close to ground, which is determined by the size of the footer and the current that flows into the footer. Once in idle (the footer is turned off, sleep=0), virtual ground slowly goes up until it reaches a steady state potential, which is close to Vdd. The steady state potential and the time it takes to reach are determined by the amount of current flows through the logic block and the footer.

The power gating domino OR gate with header is shown in Fig.1 (c). In active mode, the sleep signal is set high turning on Nsleep, the voltage of virtual Vdd is close to Vdd, which is determined by the size of the header and the current that flows into the header. Once the sleep signal is set low, virtual Vdd slowly goes down until it reaches a steady state potential. The steady state potential and the time it takes to reach are determined by the amount of current flows in the logic block and in the header [7]. Thus, in idle mode, the



Fig.1 Dynamic OR gate

leakage current path of the circuits is cut off by the sleep transistor, and the leakage power is suppressed greatly.

But, since the additional parasitic capacitance is introduced by the sleep transistor (either footer transistor or header transistor), the active power and the delay of the dynamic OR gate will increase and the increased amount is determined by the sleep transistor size.

In addition, the impact of PGT on the performance of dynamic OR gate, including leakage power reduction, active power and delay penalty, relates with the number of the transistors in the pull-down, that is, the number of the inputs. This is because, on the one hand, most of the active and leakage power are produced by the transistors in the pull-down network and, therefore, the number of these transistors greatly influences PGT's effect on lowering power. On the other hand, with the increasing of fan-in, the more transistors in the pull-down network, the more current path are constructed from dynamic node to ground. These increased current paths also decrease the delay of the evaluation stage and compensate the speed loss that PGT induces.

Thus, constructing a precise estimation system to details the relation between the number of the inputs and the leakage power reduction, the active power and delay increase can help determine whether to apply PGT in dynamic OR gates or not, according to design constrain. In the next section, we present a model based on Wavelet Neural Networks.

#### B. Implement wavelet neural networks

WNN is constructed based on wavelet analysis, which has similar structure of feed-forward neural networks. Three-layer WNN is embedded with wavelet functions as hidden layer neurons, which take wavelet space as feature space of pattern recognition. This is a multi-layer feedback architecture with wavelet, allowing the minimum time to converge to its global maximum. The WNN employs a wavelet base rather than a sigmoid function, which discriminates it from general back propagation neural networks.

The function of mapping can be expressed as:

$$f(x) = \sum_{o=1}^{h} \sum_{j=1}^{m} \omega_{o} \frac{1}{\sqrt{|a_{o}|}} \psi_{jo}(\frac{\sum_{i=1}^{n} x_{ij} - b_{o}}{a_{o}})$$
(1)

where  $\omega_0$  (0=1, 2..., h) and  $\psi_{j0}$  are output of hidden layer neurons and the wavelet bases, respectively. Networks have three parameters to be trained: output weight  $\omega_2$ , translation factors a and dilation factors b.

In this paper, a set of training samples with labels  $D = \{(y_i, x_i) \mid i = 1, 2, ..., N\}$ . And the Morlet wavelet is used as stimulation function of hidden layer  $w(x) = x = x^{2} (1, 25 x) e^{-x^{2}/2}$ 

$$\psi(x) = \cos(1.75x)e^{-x}$$

The error performance function is given by:

$$J = \frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{m} (y_{j,i}^{d} - y_{j,i})^{2}$$
<sup>(2)</sup>

where N is the total number of training patterns, and  $y_{j,i}^{d}$  and  $y_{j,i}$  are the desired and real outputs, respectively.

The training process of WNN is performed as following:

1) Create the initial population of individuals according to the initiation strategy-output weight  $\omega$ , translation factors a and dilation factors b are in (0, 1). 2) Calculate the fitness function by (2).

3) To minimize the fitness function in (2), the weights and coefficients a and b can be updated using the following formulas:

$$\omega_{j}(k) = \omega_{j}(k-1) + \eta(y^{d}(k) - y(k))h_{j} + \alpha(\omega_{j}(k-1) - \omega_{j}(k-2))$$
<sup>(3)</sup>

$$\Delta a_{j} = a_{j} (y^{d}(k) - y(k)) \omega_{j} h_{j} \frac{(\sum_{i=1}^{j} x_{ij} - b_{j})^{2}}{a_{j}^{3}}$$
(4)

$$a_{j}(k) = a_{j}(k-1) + \eta \Delta a_{j} + \alpha (a_{j}(k-1) - a_{j}(k-2))$$
(5)

$$\Delta b_{j}(k) = (y^{d}(k) - y(k))\omega_{j} \frac{\sum_{i=1}^{j} x_{ij} - b_{j}}{a_{j}^{2}}$$
(6)

$$b_{j}(k) = b_{j}(k-1) + \eta \Delta b_{j} + \alpha (b_{j}(k-1) - b_{j}(k-2))$$
(7)

$$h_{j} = \frac{1}{\sqrt{|a_{j}|}} \psi_{j}(\frac{\sum_{i=1}^{j} x_{ij} - b_{j}}{a_{j}}) \quad (j = 1, 2, \cdots, m)$$
(8)

Authorized licensed use limited to: BEIJING UNIVERSITY OF TECHNOLOGY. Downloaded on July 20, 2009 at 10:49 from IEEE Xplore. Restrictions apply.

Where  $\eta = 0.01$ ,  $\alpha = 0.05$ . *j* is the number of hidden layer neurons.

4) Repeat step 2) to step 3) until some constraint condition is satisfied, then stop and the desired individuals are obtained.

Technology Node

### III. RESULT AND ANALYSIS

As described in section II, the input of the WNN is the number of inputs of the power gating dynamic OR gates and the outputs are leakage power reduction percentage, the active power and delay increase percentage as compared to standard dynamic OR gates.

| Table I Testing Error |      |      |      |      |      |      |      |      |      |
|-----------------------|------|------|------|------|------|------|------|------|------|
| In                    |      | 2    | 4    | 8    | 16   | 32   | 48   | 64   | 96   |
|                       | LR/% | 1.68 | 1.23 | 0.33 | 0.23 | 0.16 | 0.21 | 0.20 | 0.17 |
| Footer                | AI/% | 0.62 | 2.08 | 7.98 | 0.74 | 0.41 | 0.48 | 0.36 | 0.07 |
|                       | DI/% | 1.05 | 2.59 | 0.59 | 3.29 | 4.06 | 2.37 | 0.43 | 0.28 |
|                       | LR/% | 0.50 | 0.32 | 0.22 | 0.11 | 0.14 | 0.04 | 0.86 | 0.67 |
| Header                | AI/% | 7.45 | 3.40 | 0.01 | 2.17 | 0.08 | 0.09 | 1.36 | 2.68 |

DI/% 10.0 2.85 1.03 0.30 1.40 1.44 4.00 1.84 In: the number of the transistors in the pull-down network. LR: the percentage of the leakage power reduction; AI: the percentage of the active power increase; DI: the percentage of the delay increase.



Fig.2 The estimating curve and testing curve of the dynamic OR gate with footer



Fig.3 The estimating curve and testing curve of the dynamic OR gate with header

The training data and testing data are collected from the simulation results with HSPICE tools based on 45nm BSIM4 model [11]. Each gate drives a capacitive load of 8fF and is turned to operate at 1GHz clock frequency. When simulating the leakage power, all of the dynamic OR gates are set in CHIH (clock=1, In 1= In 2=... In n=1) state, which can ensure every gates in the lowest leakage state [12]-[14]. In order to test the availability of the model, the testing data are selected from 2, 4, 6, 8, 16, 32, 48, 64, 96-inputs dynamic OR gates, because these typical gates are usually utilized in practice. Based on the previous discussion, the architecture was used for the wavelet network is 1-1-1 (one input layer, one hidden units and one output unit). The network was trained within 2500 learning iterations. The largest error E or given precision is 0.0001. The Matlab languages make up the software programs including the leakage power, the active power and delay sampling procedure, power and delay analysis, WNN training and so on. The WNN training is ended when the value of objective function is less or equal to 0.0001 and it converge very fast.

The testing errors are listed in table I. And all of the errors are less 10% including the leakage power reduction errors, the active and delay increase errors. Obviously, the estimating system possesses the high estimation accuracy. Therefore, this system based on wavelet neural network has strong stability and it can be embedded in EDA tools as power and delay estimation programming module.

Fig.2 shows the estimating curve and testing curve of the dynamic OR gate with footer. As can be seen from it, with smaller estimating errors, three testing curves fit the three estimating curves generally. The leakage power reduction percentage ranges from 65% to 95%. It indicates that PGT could enable the leakage power of dynamic OR gates to decrease at least 65%.

The active power increase percentage ranges from 10% to -15%. The reason is that, for the power gating dynamic OR gate, the virtual Vdd and virtual ground are applied to reduce the supply swing (Vswing) from Vdd-Gnd to Vdd-Vtn-Gnd, (Vtn is the threshold voltages of Nsleep). The active power Pactive can be expressed in Equation (9) [15].

## Pactive= $\alpha fC_L(Vdd-Vtn-Gnd)^2 = \alpha fC_LVswing^2$ (9)

where  $\alpha$ , f are the activity factor, clock frequency of the dynamic node of the gate, respectively. C<sub>L</sub> is the capacitive load at the dynamic node. Vswing is the dynamic node voltage swing. The active power consumed for charging/ discharging the dynamic node has a quadratic dependence on the dynamic node voltage swing with PGT. Hence, the decreased supply swing results in the reduction of the active power, which compensate the active power consumption by the sleep transistor. When the number of the inputs is less than 8, this active power compensation is not

obvious. And when the number of the inputs is more than 8, the power saved by the sleep transistor overcomes the active power that the sleep transistor consumes, and therefore, the total active power could be reduced.

Also, the delay increase percentage ranges from 10% to -15%. But, 32-input is the boundary between the delay increase and the delay reduction. This is because the delay is determined by the relative contribution of the capacitance between P1+P2 (in fig.1 (b)) and the capacitance in pull-down network. The application of PGT makes these two capacitances match well. That is, in this case, when the inputs is more than 32, the capacitance of P1+P2 matches with the capacitance in pull-down network and Nsleep (in fig.1 (a)), which speeds up the circuits.

As can be seen from Fig.3, the three estimating curves of the dynamic OR gate with header also fit three testing curves very well. The leakage power reduction percentage ranges from 65% to 90%. It means that PGT could enable the leakage power of dynamic OR gates to decrease at least 65%. Since the active power saved by the sleep transistor overcomes the active power that the sleep transistor consumes (from (9), the reason is the same as the dynamic OR gate with footer), the active power increase percentage ranges from 0 to -15%. The delay increase percentage ranges from 20% to 80%. From (10) [15], the reduced Vdd induces the significant speed loss.

$$v \propto \frac{V_{dd}^{0.3} \left(1 - \frac{V_t}{V_{dd}}\right)^{1.3}}{t_{or}^{0.5}}$$
(10)

where v is the speed of the transistor, and other parameters have their usual meanings.

In a word, as compared to the power gating dynamic OR gate with header, the power gating dynamic OR gate with footer has an advantage that it can be effective to suppress the leakage power within the smaller amount of the penalty of the delay and the active power and, therefore, in practice, the footer sleep transistor technique is a more appealing technique.

#### **IV. CONCLUSION**

In this paper, a fast and precise model based on wavelet neural networks for estimating the leakage power, the active power and the delay is proposed to help designers judge the application of PGT in dynamic OR gates under the design constrains of the power and the delay, thereby saving considerable time and energy. The simulation results for verification indicate that the estimating model can be well applied in VLSI design with accuracy ratio of more than 90%. Also, the trend of the estimating curve is analyzed. It is found that the leakage power could be reduced over 65% as compared to the standard dynamic OR gate when the PGT is applied in dynamic OR gates. And as compared to the power gating dynamic OR gate with header, the power gating dynamic OR gate with footer has an advantage that it can be effective to suppress the leakage power within the smaller amount of the penalty of the delay and the active power. In addition, through the function extension, this model can be used for estimating the impact of other optimization technique on different logic gate, which can effectively help reduce the design time.

#### References

- S. Rusu and G. Singer, "The First IA-64 Microprocessor," IEEE Journal of Solid-State Circuits, Vol. 35, No. 11, pp. 1539 - 1544, November 2000.
- [2] P. E. Gronowski et al., "High-Performance Microprocessor Design," IEEE Journal of Solid-State Circuits, Vol. 33, No. 5, pp. 676 - 686, May 1998.
- [3] K. J. Nowka and T. Galambos, "Circuit Design Techniques for a Gigahertz Integer Microprocessor," Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors, pp. 11-16, October 1998.
- [4] Chatterjee B, Sachdev M, Krishnnamurthy R., "Designing leakage tolerant, low power wide-OR dominos for sub-130 nm CMOS technologies," Microelectronics Journal, Vol. 36, pp. 801-809, June 2005.
- [5] Na Gong, Baozeng Guo, Jianzhong Lou, Jinhui Wang, "Analysis and Optimization of Leakage Current Characteristics in Sub-65nm Dual Vt Footed Domino Circuits," Microelectronics Journal. Vol. 39, pp. 1149-1155, September 2008.
- [6] International Technology Roadmap for Semiconductors, 2005, http://public.itrs.net/
- [7] Hyung-Ock Kim, Youngsoo Shin, "Analysis and Optimization of Gate Leakage Current of Power Gating Circuits," Proceedings of Asia and South Pacific Conference on Design Automation, pp. 565-569, Jun 2006.
- [8] Zhiyu Liu, Volkan Kursun, "Charge Recycling MTCMOS for Low Energy Active/Sleep Mode Transitions," IEEE International Symposium on Circuits and Systems, pp. 1389-1392, May 2007.
- [9] Aminian F, Aminian M, Collins H. W. Jr., "Analog Fault Diagnosis of Actual Circuits using Neural Networks," IEEE Transaction on Instrumentation and Measurement, Vol. 51, pp. 544-550, March 2002.
- [10] Oonsivilai, A.El-Hawary, M.E., "Wavelet neural network based short term load forecasting of electric power system commercial load," Proceeding of 1999 IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1223-1228, May 1999.
- [11] Predictive Technology Model(PTM), Hhttp://www.eas.asu.edu/~ptm
- [12] J.T. Kao, A.P. Chandrakasan, "Dual-threshold voltage techniques for low power digital circuits," IEEE Journal of Solid-State Circuits, Vol. 35, pp. 1009-1018, July 2000.
- [13] Jinhui Wang, Na Gong, Ligang Hou, Wuchen Wu, Limin Dong., "Charge Self-compensation Technology Research for Low power and high performance Domino circuits," Chinese Journal of Semiconductors, Vol. 29, pp. 1412-1416, July 2008.
- [14] Guo B Z, Gong N, Wang J H., "Designing Leakage-Tolerant and Noise-Immune Enhanced Low Power Wide OR Dominos in Sub-70nm CMOS Technologies," Chinese Journal of semiconductors, Vol. 27, pp. 804-811, May 2006.
- [15] Y.Taur and T.H.Ning, Fundamentals of modern VLSI devices, Cambridge University Press, 1998.