# Variation Aware Sleep Vector Selection in Dual V<sub>t</sub> Dynamic OR Circuits for Low Leakage Register File Design

Na Gong, Member, IEEE, Jinhui Wang, Member, IEEE, and Ramalingam Sridhar, Senior Member, IEEE

Abstract—Dual threshold voltage  $(V_t)$  technique is applied widely in dynamic OR circuits to achieve low leakage in register files (RF) design, but its effectiveness is significantly influenced by the selected sleep vector during the standby mode. As technology scales into deep nanometer era, the sleep vector selection in dual  $V_t$  dynamic OR (DV-OR) circuits becomes challenging due to the impact of PVT (process, supply voltage and temperature) variations. In this paper, we analyze the relationship among PVT variations, leakage characteristics, and sleep vectors in DV-OR circuits. We further perform a comprehensive study on sleep vector selection and explore its design space in DV-OR circuits. Finally, we present a generalization of our analysis for multiple  $V_t$  dynamic OR circuits and provide sleep vector selection guidelines to achieve low leakage and robust register files in modern processors.

Index Terms—Bit line, dual  $V_t$ , dynamic or circuit, leakage current, PVT variations, register files (RFs), sleep vector.

## I. INTRODUCTION

N modern processors, register files (RFs) are usually on the critical path and the access speed is crucial in achieving fast operation in processors [1]–[10]. Bit line structure is one of the most important peripheral circuits in RFs [1]. Although different types of bit line structures have been developed [11], the dynamic OR circuits based local (LBL) and global bit lines (GBL) is the most popular because of its high access speed [1]–[10]. Fig. 1 shows a typical RF read path. Each *p*-input dynamic LBL is followed by a two-way merge and a *q*-input dynamic GBL, thereby building an *m*-entry RF ( $m = p \times q \times 2$ ).

However, as technology scales into deep nanometer regime, dynamic OR circuits based bit line structure, resulting in large leakage current including subthreshold ( $I_{sub}$ ) and gate leakage current ( $I_{gate}$ ), accounts for a big portion of the total power

Manuscript received July 03, 2013; revised November 12, 2013; accepted December 11, 2013. Date of publication February 03, 2014; date of current version June 24, 2014. This work is supported in part by ND EPSCoR under Grant FAR0021960, the National Natural Science Foundation of China under Grant 61204040, the Beijing Municipal Natural Science Foundation under Grant 4123092, and the Ph.D. Programs Foundation of Ministry of Education of China under 20121103120018. This paper recommended by Associate Editor M. Alioto.

J. Wang is with the Beijing University of Technology, Beijing, 100124 China (e-mail: wangjinhui@bjut.edu.cn).

R. Sridhar is with the University at Buffalo (SUNY), Buffalo, NY 14260 USA (e-mail: rsridhar@buffalo.edu).

Digital Object Identifier 10.1109/TCSI.2014.2298280

consumption of RF [5]–[9]. This is because, in sleep (standby) mode, the entire bit line structure with multiple pull-down paths produces large leakage current; in active mode, all but the selected bit lines generate significant leakage. Therefore, in order to address RF power consumption, it is paramount to suppress the leakage current of dynamic OR circuits based bit line.

The dual  $V_t$  technique [12] has long been considered as an effective method in suppressing leakage current of dynamic OR circuits while maintaining the performance. A key design concern for this technique is to find an appropriate sleep vector for inputs and clock signal (CLK), which would greatly impact the effectiveness for leakage reduction [12]. However, sleep vector selection is a complex process and it involves multiple key factors. In particular, as CMOS technology scales into deep nanometer era, the increasing PVT variations are posing a major challenge to it.

This paper investigates sleep vector selection in dual  $V_t$  dynamic OR (DV-OR) circuits, while taking into account key factors including design parameters, environmental parameters, working characteristics of circuits, different application cases, and manufacturing technologies. We had earlier presented the basic idea of sleep vector selection in [13] with some preliminary results. In this paper, we extend our original work and make the following additional contributions.

- The sizing of a DV-OR circuit plays an important role in its performance and power characteristics. Using a typical dynamic circuit sizing methodology, this paper discusses the impact of sizing on sleep vector selection in DV-OR circuits (Section III).
- The temperature and supply voltage dependency of leakage currents in devices is analyzed and the expression of gate leakage variation with supply voltage is derived. In addition, this paper discusses the effect of combined temperature and supply voltage variations on sleep vector selection in depth (Section V).
- Additionally, the impact of technology scaling on sleep vector selection is analyzed with 32 nm and 16 nm process technologies (Section VI.D).
- This paper further verifies the benefits and effectiveness of the proposed sleep vector selection guidelines through implementation of 2R1W 64-entries 32b and 64b RF (details are shown in Section VI.E).
- As a significant extension, this work explores the design space for sleep vector selection and discusses the relationship between different key factors and the selected sleep

1549-8328 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

N. Gong is with the North Dakota State University, ND 58102 USA (e-mail: na.gong@ndsu.edu).



Fig. 1. Typical read path design of m entries  $\times n$  bits RF, where  $m = p \times q \times 2$ .

vector in DV-OR circuits. In addition, our analysis is generalized to dynamic OR circuits with multiple  $V_t$  devices (details can be found in Section VII).

The rest of this paper is organized as follows. Section II presents relevant background and prior work. We discuss the impact of sizing in Section III. In Sections IV and V, we analyze the influence of PVT variations on sleep vector selection in DV-OR circuits. A comprehensive analysis on sleep vector selection selection is performed in Section VI. Section VII discusses sleep vector design space exploration in DV-OR and also generalizes the analysis to multiple V<sub>t</sub> dynamic OR (MV-OR) circuits. Finally, Section VIII concludes the paper.

#### II. BACKGROUND AND PRIOR WORK

#### A. Dynamic Or Circuits Based Bit Line in RF

Table I presents the application of dynamic OR circuits in RF in industrial designs and publications. As shown, 8-input dynamic OR circuits (p = 8) are typically adopted in LBL to achieve high performance operation [10], [14]; for GBL, the fan-in number q is typically less than 8. This is due to the following two reasons. First, q is determined by the number of entries m (see Fig. 1). Since m adversely influences the access time of RF, modern processors usually adopt small RF (m < 200) with effective management mechanisms (such as early register release [15]) to support large in-flight instructions. For example, Intel's 32 nm Sandy Bridge architecture utilizes a 144-entry FP RF and a 160-entry Integer RF to improve its out-of-order execution [16]. Secondly, if q is large, then GBL is usually split into multiple parts to achieve high frequency operation [4], [6], [10]. For instance, three GBLs with 2-, 3-, and 4-input are utilized in 65 nm Intel Pentium® 4 processor [4]. Therefore, sub-eight dynamic OR circuits (fan-in number  $N \leq 8$ ) are typically utilized in practical RF design.

# B. Dual V<sub>t</sub> Dynamic OR (DV-OR) Circuits

There have been many efforts to reduce leakage current of dynamic circuits, such as voltage scaling [17], body biasing [18], variable  $V_t$  keeper [19], dynamic/output nodes discharging [21], [22]. For each technique, there is a trade-off between power efficiency, performance, and implementation cost. In addition to these techniques, the dual  $V_t$  technique [12] has been proved to be extremely effective in suppressing  $I_{sub}$  of dynamic OR circuits. Since the critical signal transitions that determine the delay of a dynamic circuit occur along the evaluation path [12], low  $V_t$  transistors are assigned to

 TABLE I

 Dynamic Or Circuits Based Bit Line in State-of-the-Art RF

| Design                           | Process    | Entries | Bits | р | q                 |
|----------------------------------|------------|---------|------|---|-------------------|
| Intel Pentium <sup>®</sup> 4 [1] | 90nm bulk  | 128     | 32   | 8 | 8                 |
| AMD Athlon TM [2]                | 90nm bulk  | 88      | 7    | 8 | 5                 |
| ISSCC'05 [3]                     | 90nm bulk  | 128     | 82   | 8 | 8                 |
| ISSCC'06 [4]                     | 65nm bulk  | 144     | 32   | 8 | 2, 3, 4 1         |
| ISSCC'11 [6]                     | 45nm SOI   | 144     | 78   | 8 | 4, 5 <sup>1</sup> |
| ISSCC'10 [7]                     | 45nm HK+MG | 64      | 41   | 8 | N/A <sup>2</sup>  |
| ISSCC'10 [8]                     | 32nm HK+MG | 64      | 32   | 8 | 4                 |
| VLSI'10 [9]                      | 32nm HK+MG | 64      | 32   | 8 | 4                 |
| JSSC'12 [10]                     | 32nm HK+MG | 160     | 164  | 8 | 5 <sup>1</sup>    |

<sup>1</sup> There are multiple GBLs. <sup>2</sup> GBL adopts static logic.



Fig. 2. Standard N-input DV-OR circuit.

evaluation path to keep the performance and high  $V_t$  transistors are employed in the precharge path for achieving low leakage current, as shown in Fig. 2. However, careful selection of sleep vector is required by this technique because it determines the leakage reduction that can be achieved. Once the sleep vector is determined, it can be applied easily to modern processors. This is because, usually data held in a register only survives for a very short time period and then the register is *dead* with invalid data until this register will be committed [23]. So the stored value in a *dead* register can be assigned by the selected input vector without penalty of access time [23]. At the same time, CLK can be set as the selected sleep vector by clock gating technique, which is also typically present in modern processors [24]. The overhead of clock gating circuitry is negligible as it can be combined with existing processor structures. As a consequence, sleep vector selection is a critical design issue in DV-OR circuits.

## C. Sleep Vector Selection in DV-OR Circuits

As shown in Fig. 2, a sleep vector of an *N*-input DV-OR circuit includes the states of *N* inputs (connected to the pulldown network (PDN)) and clock signal (CLK). Due to the parallel PDN structure, the sleep vector selection in DV-OR circuits is different from general circuits which is an NP-complete problem [25]: in a DV-OR circuit, if an input is 0 (or 1) to optimize the leakage current on its individual pull-down path, then other (N - 1) inputs should be assigned the same state. Therefore, combining the vector of CLK, there are four possible sleep vectors for DV-OR circuits: high CLK with high inputs (CHIH), high CLK with low inputs (CHIL), low CLK with high inputs (CLIH), and low CLK with low inputs (CLIL).

 TABLE II

 Leakage Current of a 65 nm DV-OR8 Circuit

| Sleep<br>Vectors | Leak             | age @ 25°C | C (nA)             | Leakage @ 110ºC (nA) |       |        |  |  |
|------------------|------------------|------------|--------------------|----------------------|-------|--------|--|--|
|                  | I <sub>sub</sub> | Igate      | I <sub>total</sub> | I <sub>sub</sub>     | Igate | Itotal |  |  |
| СНІН             | 0.970            | 137.9      | 138.9              | 8.65                 | 145.7 | 154.3  |  |  |
| CHIL             | 77.32            | 73.26      | 150.6              | 280.0                | 76.00 | 356.0  |  |  |
| CLIL             | 28.30            | 60.40      | 88.70              | 104.8                | 61.60 | 166.4  |  |  |

Among these four vectors, due to the low CLK of CLIH and CLIL vectors, the *ON* precharger generates low output (see Fig. 2). In a domino chain, the output of a dynamic gate drives the following dynamic gates, and therefore the CLIH vector is not suitable for the cascaded structure and cannot be applied effectively in practice [26]. Thus, previous work focuses on three potential sleep vectors: CHIH, CHIL, and CLIL.

The CHIH vector selected in [12], [27] can minimize  $I_{\rm sub}$  because all of  $I_{\rm sub}$  is produced by high  $V_{\rm t}$  transistors. However, with the CHIH vector, the PDN and footer generate large forward  $I_{\rm gate}$ . Therefore, in DV-OR circuits with CHIH,  $I_{\rm gate}$  dominates the total leakage current.

To suppress Igate, the CHIL vector is determined in [28] and [29], but it still suffers from large forward  $I_{\rm gate}$  generated by the ON footer. To minimize Igate, our previous work [30] proposed the CLIL vector which results in small reverse Igate produced by the PDN and footer. Also, due to the larger stack effect (the footer is also OFF), I<sub>sub</sub> in an OR circuit with CLIL is still smaller than that with CHIL. Therefore, I<sub>sub</sub> and I<sub>gate</sub> with CHIL are both larger as compared to CLIL, so CHIL cannot minimize the total leakage current. In addition, since Isub with CLIL is produced by low  $V_t$  transistors, it is larger than  $I_{sub}$ with CHIH. Correspondingly, the potential selected sleep vectors are CHIH and CLIL: CHIH results in minimum I<sub>sub</sub> and CLIL leads to minimum Igate. As an example, the leakage current of a DV-OR8 circuit in 65 nm predictive bulk technology listed in Table II exhibits the different leakage behaviors with three vectors in normal process corners.

Table III summarizes the existing work in the literature. As shown, [26] and [30] considered the impact of variations while determining the sleep vector. In [26], the authors selected CLIL due to its superior robustness to process variation. Our recent research [30] showed that the two potential sleep vectors have different robustness under variations: the CHIH vector is more robust to temperature and supply voltage variations, and the CLIL vector is less sensitive to process variation. Unfortunately, [26] and [30] evaluated the impact of process variation with the assumption that there is a uniform 10% variation  $(3\sigma)$  in process parameters (10% PP), which may not account for the large process variation in current technologies, as will be discussed in Section IV. In addition, the selection of sleep vector in prior work is only based on the robustness to variations, so it may not be suitable for different application cases. Moreover, the influences of sizing, standby intervals, HK + MG technology, and technology scaling are not included.

In the context of prior work, this paper considers all key influencing factors to make it comprehensive. Our analysis is based on 65 nm predictive bulk technology ( $V_t$  of low

TABLE III Summary of Literature Survey

| Factors                   | [12] | [27] | [28] | [29]         | [26]         | [30]         | This<br>work |
|---------------------------|------|------|------|--------------|--------------|--------------|--------------|
| Process<br>variation      | ×    | ×    | ×    | ×            | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| V <sub>dd</sub> variation | ×    | ×    | ×    | ×            | ×            | $\checkmark$ |              |
| Temperature<br>variation  | ×    | ×    | ×    | ×            | ×            | $\checkmark$ | $\checkmark$ |
| Igate                     | ×    | ×    |      | $\checkmark$ | $\checkmark$ |              |              |
| Sizing                    | ×    | ×    | ×    | ×            | ×            | ×            |              |
| HK+MG<br>technology       | ×    | ×    | ×    | ×            | ×            | ×            | $\checkmark$ |
| Standby<br>intervals      | ×    | ×    | ×    | ×            | ×            | ×            | $\checkmark$ |
| Applications              | ×    | ×    | ×    | ×            | ×            | ×            |              |
| Technology<br>scaling     | ×    | ×    | ×    | ×            | ×            | ×            | $\checkmark$ |

 $\checkmark$  Considered  $\times$  Not considered

 $V_t$  transistors:  $V_{tnlow} = |V_{tplow}| = 0.22$  V;  $V_t$  of high  $V_t$  transistors:  $V_{tnhigh} = |V_{tphigh}| = 0.35$  V;  $V_{DD} = 1$ V) and 45 nm HK + MG technology ( $V_t$  of low  $V_t$  transistors:  $V_{tnlow} = 0.34$  V and  $|V_{tplow}| = 0.23$  V;  $V_t$  of high  $V_t$  transistors:  $V_{tnhigh} = 0.45$  and  $|V_{tphigh}| = 0.35$  V;  $V_{DD} = 1$  V) [31]. The parasitic parameters were extracted using Cadence Virtuoso tool and included in our HSPICE simulation. The leakage analyses were performed based on HSPICE Monte Carlo simulations at different PVT conditions, which we will detail in Section VI.A.

## III. IMPACT OF SIZING ON SLEEP VECTOR SELECTION

In this section, the impact of sizing on sleep vector selection in DV-OR circuits is discussed. Due to the tight delay constraints, the design window of sizing is very narrow in dynamic circuits. Conventionally, to achieve high access speed, all transistors in DV-OR circuits have minimum gate length. In addition, as mentioned in Section II.B, the evaluation path is the critical path that determines the access time of a DV-OR circuit. Accordingly, sizing the following four components requires excessive care due to the performance concern: transistors in PDN, output inverter, footer, and keeper.

To determine the device size of the first two components, we used a typical dynamic circuit sizing methodology: the width of transistors in PDN ( $W_{LN}^{PDN}$ ) were determined using the logic effort method (delay optimized sizing) [33], [34] and the output static inverter is high-skewed to achieve a fast evaluation speed. Sizing the footer and keeper requires a careful balance of access time, noise margin, and power consumption for a specific application. In general, their gate width can be varied in the following reasonable ranges:

- 1) The width of footer  $(W_{LN}^{\text{footer}})$  influences evaluation speed and the clock load simultaneously and it is usually in the range of 1 to 4 times of  $W_{LN}^{\text{PDN}}$ .
- 2) In dynamic circuits, the keeper size is usually represented by Keeper Ratio (K), as given by (1). As K increases, due to the large contention current generated by the strong keeper, the noise immunity of DV-OR is improved, while increasing the access time and power consumption. Therefore, to get a fast evaluation speed with reasonable noise



Fig. 3. Sizing dependent leakage current characteristics in DV-OR8 circuits. (a) 65 nm bulk technology; (b) 45 nm HK + MG technology.

margin, the keeper size has been restricted to satisfy the condition 0.1 < K < 1.

$$K = \frac{\mu_p(W/L)_{\text{keeper}}}{\mu_n(W/L)_{\text{PDN}}} \tag{1}$$

Fig. 3 shows the leakage current of DV-OR8 circuits with different footer/keeper size. As shown, the leakage currents with two vectors increase with larger keeper/footer. This is because, both of  $I_{sub}$  and  $I_{gate}$  are proportionally to the device width. Also, as compared to the CLIL vector, CHIH is more sensitive to sizing. As the size of footer/keeper varies, the leakage currents with CHIH and CLIL fluctuate up to 32% and 19%, respectively.

In the following analysis, considering the susceptibility of dynamic OR circuits to noise, leakage, and charge sharing, we size the keeper with K = 0.5. In addition, the footer width has been considered to be equal to the width of the transistors in PDN. All dynamic circuits can achieve 8 GHz read operation at 110°C in the application of 2-read, 1-write ported (2R1W) 64-entries × 32 bits RF.

# IV. IMPACT OF PROCESS VARIATION ON SLEEP VECTOR SELECTION

As CMOS technology continues to scale, the process variation heavily influences the leakage characteristics of DV-OR circuits [35], [36]. Generally, process variation is categorized as systematic and random variation. Systematic variation, caused by optical-proximity correction, phase-shift masking, and layout induced strain, can be addressed effectively through layout optimization and resolution enhancement techniques. Yet, addressing random variation requires process and circuit innovations and it has become a great concern in nanometer technologies [35]. In particular, to overcome the scaling limitations in 45 nm node and beyond, the use of HK + MGtechnology introduces a number of new highly random effects to process parameters [36]. Accordingly, this paper focuses on leakage current variation that occurs as a result of random variation in important parameters, including random discrete doping  $(N_{ch})$ , gate length  $(L_{eff})$ , and gate oxide thickness  $(t_{ox}).$ 

As discussed in Section II.C, the analysis on process variation in prior work [26], [30] is based on a 10%PP model. However, as ITRS [32] reported in Table IV, the  $3\sigma$  value of  $L_{\rm eff}$ ,  $V_t$ , and  $t_{\rm ox}$  can be as large as 12%, 40%, and 5% in 65 nm and 45 nm technologies. Since  $I_{\rm sub}$  and  $I_{\rm gate}$  strongly depend on these process parameters as given in (2) [37], 10%PP model leads to a significant underestimation in  $I_{\rm sub}$  variation and a



Fig. 4. Scatter plots for leakage current of NMOS devices with minimum size using Monte Carlo simulations. (a) 65 nm bulk; (b) 45 nm  $\rm HK + MG$ .

TABLE IV PROCESS VARIATION MODEL

| Process variation | $3\sigma$ value of process parameters |                 |     |  |  |
|-------------------|---------------------------------------|-----------------|-----|--|--|
| ITDS [22]         | Leff                                  | t <sub>ox</sub> | Vt  |  |  |
| ITRS [32]         | 12%                                   | 5%              | 40% |  |  |

small overestimation in  $I_{gate}$  variation, thereby inducing a large underestimation in total leakage current variation. This result can be observed in Fig. 4, which compares the leakage current variation of minimum sized NMOS devices at room temperature in two technologies. Therefore, there is a need to re-evaluate the influence of process variation on sleep vector selection in wide DV-OR circuits.

$$I_{
m sub} \propto rac{1}{e^{L_{
m eff}} \cdot N_{
m ch}} \quad I_{
m sub} \propto rac{1}{e^{V_t}} \quad I_{
m gate} \propto rac{1}{t_{
m ox}^2} \qquad (2)$$

It is important to note that, for the 45 nm technology, the HK + MG technology introduces extra  $V_t$  variability because of interface roughness between silicon and the high-K dielectric, and between the high-K dielectric and metal gate. Thus, it generates larger  $I_{sub}$  variation [36]. But at the same time, the HK+MG technology decreases  $I_{gate}$  variation effectively since  $I_{gate}$  variation is directly proportional to exp(-K), where K is dielectric constant of the material [38]. Thus,  $I_{sub}$  variation is

much larger for 45 nm technology than that in 65 nm technology, as shown in Fig. 4.

As also observed in Fig. 4, in the presence of process variation,  $I_{\rm sub}$  variation is much larger than  $I_{\rm gate}$  variation for the same device (~ 7X for 65 nm and ~ 58X for 45 nm, so  $I_{\rm sub}$  variation dominates total leakage variation. Accordingly, in this subsection, we first discuss  $I_{\rm sub}$  variation with two potential vectors based on analytical formulas. Then, we evaluate the robustness of two sleep vectors to process variation.

Fig. 5(a) and (b) show I<sub>sub</sub> paths and components in *N*-input DV-OR circuits with two potential sleep vectors. I<sub>sub</sub> with two vectors can be expressed as (3) and (4), shown at the bottom of the page [30], where  $I_{sub}^{CHIH}$  is I<sub>sub</sub> with CHIH and  $I_{sub}^{CLIL}$  is I<sub>sub</sub> with CLIL;  $J_{sub}^{H,NMOS}$ ,  $J_{sub}^{H,PMOS}$ ,  $J_{sub}^{L,NMOS}$ , and  $J_{sub}^{L,PMOS}$  are I<sub>sub</sub> density per width unit of high V<sub>t</sub> NMOS, high V<sub>t</sub> PMOS, low V<sub>t</sub> NMOS, and low V<sub>t</sub> PMOS, respectively;  $W_{N1}$ ,  $W_{N2}$ ,  $W_{P1}$ ,  $W_{P2}$ ,  $W_{P3}$ ,  $W_{P4}$ ,  $W_{PDN}$ ,  $W_{footer}$  are gate widths of devices N1, N2, P1-P4, PDN and footer in Fig. 5, respectively; N is the fan-in number of DV-OR circuit;  $\lambda_{DIBL}$  is the drain induced barrier lowering (DIBL) factor; S is the sub-threshold swing.

Note that,  $\alpha_1 - \alpha_4$  in (3) and (4) only depend on the gate widths of devices, which are usually much larger than gate length L<sub>eff</sub> in DV-OR circuits, and therefore the gate width variation can be neglected [39]. Hence,  $\alpha_1 - \alpha_4$  remain constant in the presence of process variation. However, both of the DIBL effect and S are process dependent [40], so  $\eta$  variation is induced by process variation. Accordingly, the process variation induced I<sub>sub</sub> variation ( $\partial I_{sub}/\partial P$ ) with two vectors can be expressed as (5), shown at the bottom of the page. It can be seen that I<sub>sub</sub> variation with CHIH depends on I<sub>sub</sub> variation in high V<sub>t</sub> transistors, while I<sub>sub</sub> variation with CLIL is dependent on I<sub>sub</sub> variation in low V<sub>t</sub> devices. Since the effect of process variation on I<sub>sub</sub> is substantially larger for low V<sub>t</sub> transistors ( $\Delta I_{sub} \propto \exp(-V_t)$  [41]) and I<sub>sub</sub> variation with



Fig. 5.  $I_{\rm sub}$  paths in  $N\mbox{-input}$  DV-OR circuit with (a) CHIH and (b) CLIL. High  $V_t$  devices are shaded.

CLIL also increases with the fan-in number N and  $\eta$ , CLIL leads to a larger I<sub>sub</sub> variation and further a larger total leakage variation. Fig. 6 shows total leakage current distribution of DV-OR8 circuits obtained from Monte Carlo simulation of

$$I_{\text{sub}}^{\text{CHIH}} = \sum [W_{H,\text{NMOS}}] \cdot J_{\text{sub}}^{H,\text{NMOS}} + \sum [W_{H,\text{PMOS}}] \cdot J_{\text{sub}}^{H,\text{PMOS}}$$

$$= \underbrace{(W_{N1} + W_{N2})}_{\alpha 1} \cdot J_{\text{sub}}^{H,\text{NMOS}} + \underbrace{(W_{P1} + W_{P2})}_{\alpha 2} \cdot J_{\text{sub}}^{H,\text{PMOS}}$$

$$= \alpha_1 \cdot J_{\text{sub}}^{H,\text{NMOS}} + \alpha_2 \cdot J_{\text{sub}}^{H,\text{PMOS}}$$

$$I_{\text{sub}}^{\text{CLIL}} = \underbrace{(W_{P3} + W_{P4})}_{\alpha 3} \cdot J_{\text{sub}}^{L,\text{PMOS}}$$

$$+ \underbrace{\left(\frac{N \cdot W_{\text{PDN}}}{W_{\text{footer}}}\right)^{\frac{\lambda_{\text{DIBL}}}{1 + 2\lambda_{\text{DIBL}}} \cdot 10} \frac{-\lambda_{\text{DIBL}} V_{dd}}{S} \left(\frac{1 + \lambda_{\text{DIBL}}}{1 + 2_{\text{DIBL}}}\right)}{\underbrace{W_{\text{footer}}}_{\alpha 4}} \cdot J_{\text{sub}}^{L,\text{NMOS}}$$

$$= \alpha_3 \cdot J_{\text{sub}}^{L,\text{PMOS}} + \eta \cdot \alpha_4 \cdot J_{\text{sub}}^{L,\text{NMOS}}$$
(4)

$$\begin{cases} \frac{\partial I_{\text{sub}}^{\text{CHH}}}{\partial P} = \alpha_1 \cdot \frac{\partial J_{\text{sub}}^{H,\text{NMOS}}}{\partial P} + \alpha_2 \cdot \frac{\partial J_{\text{sub}}^{H,\text{PMOS}}}{\partial P} \\ \frac{\partial I_{\text{sub}}^{\text{CLH}}}{\partial P} = \alpha_3 \cdot \frac{\partial J_{\text{sub}}^{L,\text{PMOS}}}{\partial P} + \eta \cdot \alpha_4 \cdot \frac{\partial J_{\text{sub}}^{L,\text{NMOS}}}{\partial P} + \alpha_4 \cdot J_{\text{sub}}^{L,\text{NMOS}} \cdot \frac{\partial \eta}{\partial P} \end{cases}$$
(5)



Fig. 6. Distribution of total leakage current in DV-OR8 circuits.

1000 samples. The robustness to variations is usually measured by the ratio of mean leakage ( $\mu$ ) and standard deviation ( $\sigma$ ), which is also the inverse of leakage uncertainty/variability [33]. A higher robustness ( $\mu/\sigma$ ) value indicates the smaller extent of leakage variability in relation to the mean leakage. As shown in Fig. 6, the robustness ( $\mu/\sigma$ ) of the CHIH vector is higher (3.4X for 65 nm and 2.4X for 45 nm technologies) as compared to the CLIL vector.

## V. IMPACT OF SUPPLY VOLTAGE AND TEMPERATURE VARIATIONS ON SLEEP VECTOR SELECTION

In addition to process variation, the demand for low power causes supply voltage scaling and different clock cycles results in time varying currents on the power lines, which makes voltage variation a significant design challenge. Furthermore, uneven levels of activities in different parts of a chip produce within-die temperature variations [42]. In this section, the impact of temperature and voltage variations on leakage current of DV-OR circuits with the CHIH and CLIL is investigated in three scenarios: 1) considering individual temperature variation only; 2) considering voltage variation only; and 3) considering combined voltage and temperature variations.

#### A. Impact of Temperature Variation

First, we compare temperature dependency of leakage current of minimum sized NMOS devices in two technologies and the result is shown in Fig. 7(a). As expected,  $I_{sub}$  has a strong relationship with temperature ( $I_{sub} \propto T^2 \exp(1/T)$  [42]) while  $I_{gate}$  is not as sensitive to temperature. Another important observation in Fig. 7(a) is that the use of HK + MG technology also increases the temperature dependence of  $I_{sub}$ , thereby further enhancing the effectiveness of the CHIH vector in reducing total leakage variation. This is because, in traditional bulk technology, the effect of Fermi level shift and the poly depletion reduces the temperature induced  $V_t$ , but the HK+MG technology is free of these effects. Also, a higher interface state density increases  $V_t$  variation in HK + MG devices [43].

Next, we compare temperature induced leakage current variation of DV-OR8 circuits in 65 nm and 45 nm technologies as shown in Fig. 8(a) and (b). Since  $I_{sub}$  varies significantly with temperature and  $I_{gate}$  is almost insensitive to temperature,  $I_{sub}$ 



Fig. 7. Variation of leakage current in minimized NMOS device to (a) temperature and (b) voltage.

variation dominates total leakage current variation as temperature fluctuates. As also shown in Fig. 8, the CHIH vector results in minimum  $I_{sub}$ , and hence it is more robust to temperature variation than CLIL.

## B. Impact of Supply Voltage Variation

Fig. 7(b) shows the dependency of  $I_{sub}$  and  $I_{gate}$  on  $V_{dd}$  in minimum sized NMOS devices. We can see that, as  $V_{dd}$  varies from 0.5 V to 1 V,  $I_{gate}$  variation is about five times as large as  $I_{sub}$  variation. This is because, in MOS devices, although both  $I_{gate}$  and  $I_{sub}$  have exponential dependencies on  $V_{dd}$ ,  $I_{gate}$  depends on  $V_{dd}$  more strongly, which is reflected in a stronger exponential function [44]. Therefore, in this subsection, first,  $I_{gate}$ variation with two sleep vectors under supply voltage variation is discussed analytically and then the robustness of two potential sleep vectors is discussed.

As shown in Fig. 9(a) and (b),  $I_{gate}$  in DV-OR circuits with two sleep vectors can be expressed as [30]

$$\begin{split} I_{\text{gate}}^{\text{CHIH}} &= \sum \left[ W_{\text{LN}} \right] \cdot J_{\text{gate,forward}}^{L,\text{NMOS}} + \frac{1}{2} \cdot \sum \left[ W_{\text{HN}} \right] \\ &\quad \cdot J_{\text{gate,reverse}}^{H,\text{NMOS}} \\ &= \left( N \cdot W_{\text{PDN}} + W_{\text{footer}} \right) \cdot J_{\text{gate,forward}}^{L,\text{NMOS}} \\ &\quad + \frac{1}{2} \cdot \left( W_{N1} + W_{N2} \right) \cdot J_{\text{gate,forward}}^{H,\text{NMOS}} \\ &\quad \cong \left( N + 1 \right) \cdot W_{\text{PDN}} \cdot J_{\text{gate,forward}}^{L,\text{NMOS}} \\ &\quad + W_{N1} \cdot J_{\text{gate, reverse}}^{H,\text{NMOS}} \end{split}$$
(6)



Fig. 8. Variation of leakage current in DV-OR8 circuits to temperature: (a) CHIH; (b) CLIL.

$$\begin{split} I_{\text{gate}}^{\text{CLIL}} &= \frac{N}{2} \cdot W_{\text{PDN}} \cdot J_{\text{gate,reverse}}^{L,\text{NMOS}} \\ &+ \sum \left[ W_{H,\text{NMOS}} \right] \cdot J_{\text{gate,forward}}^{H,\text{NMOS}} \\ &= \frac{N}{2} \cdot W_{\text{PDN}} \cdot J_{\text{gate,reverse}}^{L,\text{NMOS}} \\ &+ (W_{N1} + W_{N2}) \cdot J_{\text{gate,forward}}^{H,\text{NMOS}} \\ &\cong \frac{N}{2} \cdot W_{\text{PDN}} \cdot J_{\text{gate,reverse}}^{L,\text{NMOS}} + 2 \cdot W_{N1} \cdot J_{\text{gate,forward}}^{H,\text{NMOS}} \end{split}$$

$$(7)$$

where  $I_{gate}^{CHIH}$  is  $I_{gate}$  with the CHIH vector and  $I_{gate}^{CLIL}$  is  $I_{gate}$ with the CLIL vector;  $J_{gate,forward}^{L,NMOS}$  and  $J_{gate,forward}^{H,NMOS}$  are forward  $I_{gate}$  density per unit width ( $J_{gate,forward}$ ) of low  $V_t$  and high  $V_t$  NMOS, respectively;  $J_{gate,reverse}^{L,NMOS}$  and  $J_{gate,reverse}^{H,NMOS}$  are reverse  $I_{gate}$  density per unit width ( $J_{gate,reverse}$ ) of low  $V_t$  and high  $V_t$  NMOS, respectively;  $W_{N1}, W_{N2}, W_{PDN}, W_{footer}$  are gate widths of devices N1, N2, PDN and footer in Fig. 9 and they can be assumed to be constants under PVT variations.

Researchers have shown that  $J_{\text{gate,forward}}^{L,\text{NMOS}}$  mainly consists of gate-to-channel tunneling currents which flow from gate to source/drain through channel, while  $J_{\text{gate,reverse}}^{H,\text{NMOS}}$  is mainly composed of edge-direct-tunneling (EDT) currents which flow from source/drain to gate through source-drain extension [30].  $J_{\text{gate,forward}}^{L,\text{NMOS}}$  is larger than  $J_{\text{gate,reverse}}^{H,\text{NMOS}}$ , e.g., about 56% and



Fig. 9.  $I_{\rm gate}$  paths in N-input DV-OR circuit with (a) CHIH and (b) CLIL. High  $V_t$  devices are shaded.

51% larger in our experiment with 65 nm and 45 nm technologies, respectively. So (6) can be rewritten as

$$I_{\text{gate}}^{\text{CHIH}} \cong (N+1) \cdot W_{\text{PDN}} \cdot J_{\text{gate,foward}}^{L,\text{NMOS}}$$
(8)

Also, in practical design,  $N \ge 2$  and  $W_{\rm PDN}$  is usually several times larger than to enhance the evaluation speed;  $J_{\rm gate, forward}^{H,\rm NMOS}$  is slightly larger than  $J_{\rm gate, reverse}^{L,\rm NMOS}$  (~ 10% and 3% larger in our experiment with 65 nm and 45 nm technologies, respectively). So (7) can be rewritten as

$$I_{\text{gate}}^{\text{CLIL}} \cong \frac{N+4}{2} \cdot W_{\text{PDN}} \cdot J_{\text{gate,reverse}}^{L,\text{NMOS}}$$
(9)

Based on the above analysis, we can see that  $I_{gate}$  in DV-OR circuits with two potential sleep vectors are mainly generated by low  $V_t$  devices in the PDN and footer, which can be called  $I_{gate}$ -generating-network (GGN), as shown in Fig. 10(a). The supply voltage variation causes the voltage variation of dynamic node, thereby inducing  $I_{gate}$  variation in GGN. Accordingly, supply voltage induced  $I_{gate}$  variation can be expressed as

$$\begin{cases} \frac{\partial I_{\text{gate}}^{\text{CHIH}}}{\partial V} = (N+1) \cdot W_{\text{PDN}} \cdot \frac{\partial J_{\text{gate,foward}}^{L,\text{NMOS}}}{\partial V^{\text{CHIH}}_{\text{dynamic}}} \\ \frac{\partial I_{\text{gate}}^{\text{CLIL}}}{\partial V} = \frac{N+4}{2} \cdot W_{\text{PDN}} \cdot \frac{\partial J_{\text{gate,reverse}}^{L,\text{NMOS}}}{\partial V^{\text{CLIL}}_{\text{dynamic}}} \end{cases} (10)$$

where  $V_{dynamic}^{CHIH}$  and  $V_{dynamic}^{CLIL}$  are the voltage of dynamic node with two vectors. In a DV-OR gate with CHIH, the pre-charger and keeper are both *OFF* and the dynamic node is isolated from supply voltage variation. So the voltage of dynamic node stays



Fig. 10. Variation of leakage current 65 nm DV-OR8 circuit to supply voltage: (a) Voltage variation of dynamic node in 65 nm DV-OR8 circuit; (b) Leakage current variation of 65 nm DV-OR8 circuits with two sleep vectors.



Fig. 11. Constant total leakage current contours in DV-OR8 circuits with two sleep vectors in 65 nm bulk and 45 nm HK + MG technologies.

almost zero, which is confirmed in Fig. 10(a). Also, as verified in our simulation, due to the similar mechanism,  $J_{\text{gate,foward}}$ and  $J_{\text{gate,reverse}}$  for the same device have close dependencies on voltage. Therefore, CHIH reduces  $I_{\text{gate}}$  variation effectively, as also shown in Fig. 10(b). On the other hand, with the CLIL vector, the pre-charger and keeper are both *ON*. If supply voltage varies,  $V_{\text{dynamic}}^{\text{CLIL}}$  is also changed by the same amount, thereby producing larger  $I_{\text{gate}}$  variation in GGN.

#### C. Impact of Combined Temperature and Voltage Variations

The combined effect of temperature and supply voltage variations is discussed in this subsection. Fig. 11 shows the total leakage current contours of DV-OR8 circuits. Comparison between two technologies emphasizes the fact that the large leakage problem is more serious with the scaling of technology. To quantify it further, as the technology moves from 65 nm to 45 nm, the leakage current of DV-OR circuits with similar performance increases by almost 6 X (~ 900 nA/150 nA) for the CHIH sleep vector and 9 X (~ 1200 nA/140 nA) for CLIL. As also shown in Fig. 11, for both technologies, the total leakage variation with the CHIH vector (~ 1.2 X for 65 nm and ~ 1.94 X for 45 nm) is much smaller than that with the CLIL vector (~ 9.74 X for 65 nm and ~ 12.68 X for 45 nm). This is due to the better robustness of the CHIH vector to variations.

To this point, we have investigated the impact of PVT variations on the leakage current characteristics of DV-OR circuits and concluded that the CHIH vector has superior robustness to PVT variations. However, it is still not assured that the CHIH vector is the best sleep vector to suppress leakage current under PVT variations. This is because, the sleep vector selection also depends on other important factors such as application cases. As a consequence, it is highly desirable to perform a comprehensive study of sleep vector selection in DV-OR circuits.

## VI. COMPREHENSIVE STUDY ON SLEEP VECTOR SELECTION IN DV-OR CIRCUITS

#### A. Experiment Setup

Our study adopts the process variation model specified by ITRS [32] in Table IV. As also reported by ITRS, the supply voltage is assumed to have an independent normal Gaussian distribution with  $3\sigma$  variation of 10%.

We assume the working temperature is 110°C since it is a typical hot-spot temperature in modern microprocessors [28]. The temperature variation in sleep circuits depends on the interval of standby mode, so our analysis considers two types of sleep circuits in practice: (1) circuits with short standby intervals (SSI). We assume the sleep temperature of circuits is reduced gradually from 110°C to room temperature; (2) circuits with long standby intervals (LSI). For these circuits, the sleep temperature can be assumed to stay at room temperature with only 1°C variation [50]. 1000 Monte Carlo simulations are done to achieve enough statistical accuracy.

Our following analysis starts by a study on application aware sleep vector selection for DV-OR circuits. Following this, general cost functions based sleep vector selection is investigated,



Fig. 12. LVC of DV-OR circuits with different sleep vectors in 65 nm bulk and 45 nm HK + MG technologies: (a) DV-OR4; (b) DV-OR8; (c) DV-OR16.

and then, the impact of technology scaling is examined. Finally, the effectiveness of the selected sleep vector is verified in 64-entries RF.

## B. Application Aware Sleep Vector Selection

As discussed before, previous work on sleep vector selection for DV-OR circuits were based on two criteria: 1) the work [12], [27]–[29] ignored the impact of PVT variations and only considered the leakage reduction. Accordingly, the selected sleep vector is also the minimum sleep vector; and 2) the work [26] and [30] did not consider the leakage reduction, so the determined sleep vector is the vector with the best robustness to variations.

However, the leakage reduction and the robustness both influence the yield of RF [45]. Therefore, it is necessary to consider these two factors and also their relative significance in different application cases. For instance, while designing processors for highly reliable servers, such as IBM POWER6<sup>TM</sup>, the robustness to variations is the top priority. Alternatively, when we implement processors for portable biomedical device, sensor networks, and other ultra-low power applications, the leakage reduction is extremely important.

As a consequence, we define an application aware Leakage-Variation-Cost (LVC) function and the formula to determine the sleep vector is given in min LVC

min LVC  
LVC = 
$$\lambda \cdot \mu(\vec{V}) + (1 - \lambda) \cdot \left\{ \sigma(\vec{V}) \middle/ \mu(\vec{V}) \right\}$$
  
s.t.  $\vec{V} \in \{\text{CHIH, CLIL}\}, \quad 0 \le \lambda \le 1$  (11)

where  $\mu$  is used to evaluate the leakage reduction with sleep vector  $\vec{V}$ ;  $\sigma/\mu$  is the uncertainty of leakage current and it is also the reciprocal of the robustness against variations;  $\lambda$  is the weighting factor, which indicates the relative significance of leakage reduction and robustness in different application cases. In particular, the priority of leakage reduction becomes higher as  $\lambda$  increases; in the extreme case with  $\lambda = 1$ , the leakage reduction is the only design concern.

Fig. 12 compares LVC of CHIH and CLIL for different DV-OR circuits. As shown, the LVC comparison varies with

the fan-in number (N) of DV-OR circuits. Take 65 nm circuits as an example, for DV-OR4, as compared to the CLIL vector, the achieved LVC savings with CHIH ranges from 28.1% to 87.3% for LSI and from 39.3% to 41.7% for SSI, depending on  $\lambda$  in difference application cases. With decreasing  $\lambda$ , the relative significance of robustness increases and hence, more LVC savings can be achieved by CHIH. For DV-OR8 circuits, LVC with CHIH is still the minimum in all cases, but it is very close to that with CLIL. However, as N continues to increase, the CLIL vector is more likely to become the best sleep vector. For DV-OR16 circuits, LVC of the CHIH vector is smaller than CLIL only in the extreme case with  $\lambda < 0.03$ , when the robustness is the top design priority; for the majority application cases, the CLIL vector minimizes LVC. The main reason of the opposite result obtained from different circuits is that, as Nincreases, the number of parallel paths in GGN becomes larger and so  $I_{gate}$  increases accordingly, as expressed in (6) and (7). The CLIL vector, which minimizes  $I_{\rm gate},$  can achieve minimum average leakage current and it minimizes LVC in most cases. It is important to note that, for sub-eight DV-OR circuits, the CHIH vector is able to achieve the minimum LVC across all application cases. Furthermore, as evident from Fig. 12, for circuits with different standby intervals, LVC of SSI is always larger than that of LSI in the same application due to the smaller effect of temperature variation for LSI.

### C. General Cost Functions Based Sleep Vector Selection

For a more comprehensive analysis, we further discuss sleep vector selection based on three general cost functions under variations:  $C_1 = 0.5 \ \mu + 0.5 \ \sigma$ , which shows the variation cost in typical case [37];  $C_2 = \mu + 6 \ \sigma$ , which indicates the variation cost in worst case [46];  $C_3 = \mu \times \sigma$ , which is used to evaluate the overall cost under variations [47].

We also calculate the improvement of  $C_i$  with the CHIH vector as compared to that with the CLIL vector OptCHIH<sub>i</sub>

$$OptCHIH_i = \frac{C_i@CLIL - C_i@CHIH}{C_i@CLIL} \quad i = 1, 2, 3 \quad (12)$$

where  $C_i$ @CHIH and  $C_i$ @CLIL represent the *i*th cost criteria with CHIH and CLIL, respectively. Obviously, if OptCHIH<sub>i</sub>  $\geq$ 

TABLE V STATISTICAL RESULT OF LEAKAGE CURRENT IN DV-OR CIRCUITS AND GENERAL COST FUNCTIONS BASED SLEEP VECTOR SELECTION

| Cost                    | Veeter                      | T    | Leakage in 65nm bulk technology (nA) |        |        |        | Leakage in 45nm HK+MG technology (nA) |        |        |       |       |        |
|-------------------------|-----------------------------|------|--------------------------------------|--------|--------|--------|---------------------------------------|--------|--------|-------|-------|--------|
|                         | Vector                      |      | OR2                                  | OR4    | OR8    | OR16   | OR32                                  | OR2    | OR4    | OR8   | OR16  | OR32   |
| μ —                     | СНІН                        | LSI/ | 60.14/                               | 89.96/ | 148.2/ | 264.6/ | 504.5/                                | 603.7/ | 715.1/ | 933/  | 1351/ | 2210/  |
|                         |                             | SSI  | 69.24                                | 100.2  | 163.4  | 282.2  | 528.1                                 | 726.6  | 833.8  | 1070  | 1529  | 2427   |
|                         | CLIL                        | LSI/ | 103.2/                               | 125.2/ | 147.2/ | 171.7/ | 257.9/                                | 871.3/ | 901.3/ | 978/  | 1165/ | 1436/  |
|                         |                             | SSI  | 166.7                                | 166.7  | 188.3  | 247.3  | 315                                   | 1417   | 1369   | 1437  | 1526  | 1850   |
|                         | СНІН                        | LSI/ | 15.9/                                | 21.8/  | 36.70/ | 64.2/  | 128/                                  | 409/   | 317/   | 520/  | 287/  | 597/   |
|                         | CHIH                        | SSI  | 25.8                                 | 47.3   | 60.20  | 108    | 139                                   | 358    | 363    | 469   | 765   | 757    |
| $\sigma$ –              | CLIL                        | LSI/ | 108/                                 | 238/   | 203/   | 114/   | 124/                                  | 674/   | 692/   | 754/  | 975/  | 1060/  |
|                         |                             | SSI  | 225                                  | 135    | 171    | 235    | 194                                   | 1566   | 1293   | 1477  | 1248  | 1436   |
|                         | 611111                      | LSI/ | 38/                                  | 56/    | 92/    | 164/   | 316/                                  | 506/   | 516/   | 727/  | 819/  | 1404/  |
|                         | СНІН                        | SSI  | 48                                   | 74     | 112    | 195    | 334                                   | 542    | 598    | 770   | 1147  | 1592   |
| <i>C</i> <sub>1</sub>   | CLIL                        | LSI/ | 106/                                 | 182/   | 175/   | 143/   | 191/                                  | 773/   | 797/   | 866/  | 1070/ | 1248/  |
|                         |                             | SSI  | 196                                  | 151    | 180    | 241    | 255                                   | 1492   | 1331   | 1457  | 1387  | 1643   |
|                         | <b>OptCHIH</b> 1            | LSI/ | 0.64/                                | 0.69/  | 0.47/  | -0.15/ | -0.66/                                | 0.34/  | 0.35/  | 0.16/ | 0.23/ | -0.12/ |
|                         |                             | SSI  | 0.76                                 | 0.51   | 0.38   | 0.19   | -0.31                                 | 0.64   | 0.55   | 0.47  | 0.17  | 0.03   |
|                         | CHIN                        | LSI/ | 0.16/                                | 0.22/  | 0.37/  | 0.65/  | 1.27/                                 | 0.30/  | 0.26/  | 0.41/ | 0.31/ | 0.58/  |
|                         | СНІН                        | SSI  | 0.22                                 | 0.38   | 0.53   | 0.93   | 1.36                                  | 0.29   | 0.30   | 0.39  | 0.61  | 0.70   |
| C <sub>2</sub> 1        | CLIL                        | LSI/ | 0.75/                                | 1.55/  | 1.37/  | 0.86/  | 1.00/                                 | 0.49/  | 0.50/  | 0.55/ | 0.70/ | 0.78/  |
| C2-                     |                             | SSI  | 1.52                                 | 0.98   | 1.21   | 1.66   | 1.48                                  | 1.08   | 0.91   | 1.03  | 0.90  | 1.05   |
|                         | OptCHIH <sub>2</sub>        | LSI/ | 0.79/                                | 0.86/  | 0.73/  | 0.24/  | -0.27/                                | 0.38/  | 0.48/  | 0.26/ | 0.56/ | 0.26/  |
|                         |                             | SSI  | 0.85                                 | 0.61   | 0.57   | 0.44   | 0.08                                  | 0.73   | 0.67   | 0.62  | 0.32  | 0.33   |
|                         | СНІН                        | LSI/ | 0.09/                                | 0.20/  | 0.54/  | 1.70/  | 6.46/                                 | 0.25/  | 0.23/  | 0.49/ | 0.38/ | 1.32/  |
| <i>C</i> 3 <sup>2</sup> |                             | SSI  | 0.18                                 | 0.47   | 0.98   | 3.05   | 7.34                                  | 0.26   | 0.30   | 0.50  | 1.17  | 1.84   |
|                         | CLIL                        | LSI  | 1.11/                                | 2.98/  | 2.99/  | 1.96/  | 3.20/                                 | 0.59/  | 0.62/  | 0.74/ | 1.14/ | 1.52/  |
|                         |                             | SSI  | 3.75                                 | 2.25   | 3.22   | 5.81   | 6.11                                  | 2.22   | 1.77   | 2.12  | 1.90  | 2.66   |
|                         | <b>OptCHIH</b> <sub>3</sub> | LSI/ | 0.91/                                | 0.93/  | 0.82/  | 0.13/  | -1.02/                                | 0.58/  | 0.64/  | 0.34/ | 0.66/ | 0.13/  |
|                         | OpiCH1H3                    | SSI  | 0.95                                 | 0.79   | 0.69   | 0.48   | -0.20                                 | 0.88   | 0.83   | 0.76  | 0.39  | 0.31   |

<sup>1</sup> The unit is  $10^3$  for 65 nm and  $10^4$  for 45 nm technology <sup>2</sup> The unit is  $10^4$  for 65 nm and  $10^5$  for 45 nm technology

0, CHIH is the best sleep vector and achieves the lowest cost based on *i*th criteria; otherwise, CLIL is the best sleep vector.

Table V shows a comparison on three general cost functions for different circuits. For  $C_i$  in Table V, the last row reports the defined parameter OptCHIH<sub>i</sub>. Once again we can see that the CHIH vector can be applied for sub-eight DV-OR circuits across all application cases and it yields 38%–93% and 16%–88% cost reduction compared to CLIL for 65 nm and 45 nm technologies, respectively.

As also observed in Table V, as N increases, the CHIH vector is more likely to remain the best sleep vector for 45 nm HK + MG technology. For example, based on  $C_2$  and  $C_3$ , the CHIH vector is still able to achieve the lowest cost for 45 nm OR32 circuit, but the sleep vector with minimum cost has become CLIL for 65 nm OR32 circuit. This is due to the larger contribution of I<sub>sub</sub> in total leakage current in HK + MG technology, as discussed in Section IV.

The above simulation results and discussions suggest that the CHIH vector provides dual benefits of leakage reduction and robustness for sub-eight DV-OR circuits, which is the typical application in practical RF design.

## D. Impact of Technology Scaling on Sleep Vector Selection

To investigate the effectiveness of CHIH in advanced technologies beyond the 45 nm node, we also evaluate  $OptCHIH_i$  (i = 1, 2, 3) of DV-OR gates based on 32 nm and 16 nm predictive HK + MG technologies [31]. The results are shown in Fig. 13. It can be observed that CHIH consistently achieves cost reduction across all technologies. In particular,  $OptCHIH_1$ ,  $OptCHIH_2$ , and  $OptCHIH_3$  for 16 nm DV-OR8



Fig. 13.  $OptCHIH_1$ ,  $OptCHIH_2$ ,  $OptCHIH_3$  in 32 nm and 16 nm predictive HK + MG technologies.

circuit are 0.958, 0.951 and 0.999, respectively. It indicates that three general cost functions under variations with CHIH are considerably lower as compared to those with CLIL (see (12)). This is because CMOS technologies at the 22 nm node and beyond is extremely sensitive to variations [32]. As a consequence, the robustness benefit of CHIH is more pronounced with technology scaling.

## E. Leakage Reduction Under Different RF Configurations

To further demonstrate the effectiveness of CHIH, we implement 2R1W 64-entries RF with 32 bits and 64 bits and carry on simulations using HSPICE and CACTI5 [48], obtaining the improvement of three general cost functions with CHIH as compared to random vectors and CLIL.

We first designed 65 nm and 45 nm DV-OR based bit lines with p = 8 and q = 4 for 2R1W 64-entries RF (see Fig. 1).



Fig. 14. Layout of 45 nm 2R1W 64-entries  $\times$  32 bits RF (87.5  $\times$  33.9  $\mu$ m<sup>2</sup>).



Fig. 15. Bit line leakage power ratio in RF with different configurations.

Fig. 14 shows the layout of 45 nm RF with 64-entries and 32 bits based on conservative MOSIS deep sub-micrometer technology [20]. To achieve a fast read path, LBL is placed close to bit-cell to reduce bit line capacitance ( $C_{BL}$ ). For 65 nm and 45 nm 2R1W 64-entries × 32 bits RF,  $C_{BL}$  are estimated to be 0.79 *f*F and 0.52 *f*F, respectively.

Then, we run HSPICE Monte Carlo simulations to obtain the statistical result under PVT variations. Due to the large run time of Monte Carlo simulation, 50 random vectors are used in our simulation and the probability of each primary input toggling between successive vectors is 50%. The average statistical result is obtained from Monte Carlo simulation with 1000 iterations.

Next, using a modified version of CACTI5, we model four different 2R1W 64-entries RF and estimate the leakage power ratio of bit lines as shown in Fig. 15. It shows that the bit lines consume more than 60% of the total leakage power.

Finally, by combining the statistical result and obtained leakage power ratio, we estimate three cost functions improvement of RF with CHIH over random vectors as shown in Fig. 16(a). The CHIH vector successfully improves the cost functions considered by 12.7%–48.9% and 13.4%–41.4% in 65 nm and 45 nm RF, respectively. In addition, Fig. 16(b) compares the cost improvement of the CHIH vector over CLIL. We can see that CHIH significantly outperforms CLIL by 11.4%–55.5% in cost functions considered, depending on the manufacturing technology as well as RF configuration.

#### VII. DISCUSSIONS

# A. Design Space Exploration of Sleep Vector Selection in DV-OR Circuits

The above analysis primarily targeted the sleep vector selection guidelines for DV-OR circuits in practical RF applications



Fig. 16. Cost Improvement of RF with the CHIH vector as compared to (a) random vector; (b) the CLIL vector.

 $(N \le 8)$ . In this subsection, we discuss the sleep vector selection in DV-OR circuits from a general perspective.

For an N-input DV-OR circuits, its special PDN structure reduces a complex  $2^{N+1}$ -dimensional  $(2^{N+1} - D)$  to 4D design space including CHIH, CHIL, CLIH, and CLIL. Since CHIL always results in large leakage current and CLIH cannot be applied in practice, the design space further becomes 2D with CHIH and CLIL. In this 2D design space, multiple key factors including design parameters, environmental parameters, working characteristics of circuits, application cases, and manufacturing technologies are playing important roles in our decision, as shown in Fig. 17. Based upon the impact on sleep vector selection, we categorize these factors into the following three types:

- Type-I factors that favor the use of CHIH;
- Type-II factors that moves our decision to CLIL;
- Type-III factors that have different impacts in various conditions.
- 1) Type-I factors include PVT variations, requirement of robustness of applications, HK + MG technology, and technology scaling. This is because: i) the CHIH vector offers super robustness to PVT variations and technology scaling results in larger variations; and ii) the HK + MG technology makes  $I_{sub}$  account for a larger proportion of total leakage current and therefore the CHIH vector, which results in minimum  $I_{sub}$ , receives more emphasis.
- 2) The CLIL vector results in minimum  $I_{gate}$ , which grows very fast with fan-in number N. Also, the CLIL vector



Fig. 17. Design space of sleep vector selection in DV-OR circuit.

is less sensitive to the sizing, as discussed in Section III. Accordingly, device sizing and N are Type-II factors.

3) The requirement for leakage reduction of applications is a Type-III factor because its influence depends on the fan-in number of a DV-OR circuit. As N is small ( $\leq 8$ ), I<sub>sub</sub> dominates the total leakage current and the CHIH vector can achieve the best leakage reduction. This leakage reduction effectiveness, however, diminishes as N grows, suggesting that the increasing requirement of leakage reduction moves our decision to the CLIL vector.

### B. Extension to Multiple V<sub>t</sub> Dynamic OR Circuits (MV-OR)

Finally, let us consider the general case of dynamic OR circuits with multiple  $V_t$  devices (MV-OR), such as ultra-low  $V_t$ , low  $V_t$ , standard  $V_t$ , high  $V_t$ , ultra-high  $V_t$ . Different  $V_t$  settings provide the designer with more opportunities to achieve low power, at higher manufacturing cost. Similar to DV-OR circuits, N inputs should be assigned the same state in the sleep vector selection process due to the parallel PDN structure, achieving the uniform access time to different rows of RF bit-cells. Accordingly, we get the sleep vector

$$\vec{V} \in \{\text{CHIH}, \text{CHIL}, \text{CLIH}, \text{CLIL}\}$$
 (13)

The sleep vector selection process is as follows. First, as we discussed before, with a specific manufacturing technology, the leakage currents of an N-input MV-OR circuit with sleep vector  $\vec{V}$  can be expressed as

$$I_{\text{leak}}^{\vec{V}} = I_{\text{sub}}^{\vec{V}} + I_{\text{gate}}^{\vec{V}}$$
(14)

We expand (14) as a Taylor series and only retain the first order term. Here, we consider the impact of fan-in number (N), sizing (Z), PVT variations (PVT):

j

$$\begin{aligned} I_{\text{leak}}^{\vec{V}}(\Delta N, \Delta Z, \Delta \text{PVT}) &= I_{\text{leak}0}^{\vec{V}}(N_0, Z_0, P_0, V_0, T_0) \\ &+ (N - N_0) \left. \frac{\partial I_{\text{leak}}^{\vec{V}}}{\partial N} \right|_{N_0} + (Z - Z_0) \left. \frac{\partial I_{\text{leak}}^{\vec{V}}}{\partial Z} \right|_{Z_0} \\ &+ (P - P_0) \left. \frac{\partial I_{\text{leak}}^{\vec{V}}}{\partial P} \right|_{P_0} + (V - V_0) \left. \frac{\partial I_{\text{leak}}^{\vec{V}}}{\partial V} \right|_{V_0} \\ &+ (T - T_0) \left. \frac{\partial I_{\text{leak}}^{\vec{V}}}{\partial T} \right|_{T_0} \end{aligned}$$
(15)

Here, instead of mathematical expressions, we focus on the relationship between different factors and leakage current. Therefore, we apply Sensitivity Analysis (SA) [49]: considering small variation of a factor  $x, I_{\text{leak}}^{\vec{V}}$  will vary from its nominal value  $I_{\text{leak}0}^{\vec{V}}$  and the sensitivity factor  $S_{I,x,\vec{V}}$  of  $I_{\text{leak}}^{\vec{V}}$  to x can be expressed as follows:

$$\frac{\partial I_{\text{leak}}^{\vec{V}}}{\partial x}|_{x_0} = S_{I,x,\vec{V}} \cong \frac{\Delta I_{\text{leak}}^{\vec{V}}}{\Delta x}$$
$$\Rightarrow I_{leak}^{\vec{V}} = I_{\text{leak}0}^{\vec{V}} + S_{I,x,\vec{V}} \cdot \Delta x \tag{16}$$

Accordingly, an approximated linear relation between leakage current and different factors can be obtained:

$$\begin{split} I_{\text{leak}}^{V} &= I_{\text{leak0}}^{V} + S_{I,N,\vec{V}} \cdot \Delta N + S_{I,Z,\vec{V}} \cdot \Delta Z \\ &+ S_{I,\text{PVT},\vec{V}} \cdot (\Delta P, \Delta V, \Delta T) \quad (17) \end{split}$$

Finally, the requirement of applications is associated with the selected cost function f conditions and thus the sleep vector can be determined based on (18):

min 
$$f(\vec{V})$$
  
s.t.  $\vec{V} \in \{\text{CHIH}, \text{CHIL}, \text{CLIH}, \text{CLIL}\}$  (18)

In our previous analysis for DV-OR circuits, we use  $f = \{LVC, C_1, C_2, C_3\}$  to cover a wide range of applications, showing that the CHIH vector is highly beneficial in terms of both leakage reduction and robustness for typical application of DV-OR in practical RF.

It is worth mentioning that, although only RF is considered in this work, the analysis is general enough to consider sleep vector selection for other on-chip memories with similar wide-OR bit line structures such as L1 SRAM in AMD "Bulldozer" [10].

### VIII. CONCLUSION

This paper explores sleep vector selection in DV-OR circuits, while considering design parameters, environmental parameters, working characteristics of circuits, application cases, and manufacturing technologies. It shows that the CHIH vector offers significant advantages over other vectors for practical RF applications and it yields up to 48.9% and 55.5% reduction in cost functions considered for 2R1W 64-entries RF, as compared to random vector and CLIL, respectively. Our detailed analysis of the dependence of leakage characteristics on key parameters can help circuit designers in developing novel low leakage RF design. More importantly, the analysis in this paper may be extended to the bit line design of other on-chip memories in modern processors. Based on these design guidelines, circuit-architecture co-design to achieve low leakage RF is our continued topic of research.

#### REFERENCES

- G. Hinton *et al.*, "The microarchitecture of the pentium 4 processor," *Intel Technol. J.*, vol. 5, no. 1, pp. 1–12, May 2001.
- [2] AMD Inc., AMD Athlon Processor Technical Brief 1999 [Online]. Available: http://support.amd.com/us/Processor\_Tech-Docs/22054.pdf, [Online]. Available:
- [3] E. S. Fetzer, L. Wang, and J. Jones, "The multi-threaded, parity-protected 128-word register files on a dual-core itanium®-Family processor," in *Proc. ISSCC*, Feb. 2005, pp. 382–384.

- [4] S. Wijeratne et al., "A 9 GHz 65 nm intel pentium® 4 processor integer execution core," in Proc. ISSCC, Feb. 2006, pp. 110–112.
- [5] S. Hsu *et al.*, "An 8.8 GHz 198 mW 16 × 64b 1R/1W variation-tolerant register file in 65 nm CMOS," in *Proc. ISSCC*, Feb. 2006, pp. 1785–1797.
- [6] G. S. Ditlow *et al.*, "A 4R2W register file for a 2.3 GHz wire-speed power<sup>™</sup> processor with double-pumped write operation," in *Proc. ISSCC*, Feb. 2011, pp. 256–258.
- [7] G. Burda *et al.*, "A 45 nm CMOS 13-port 64-Word 41b fully associative content-addressable register file," in *Proc. ISSCC*, Feb. 2010, pp. 286–287.
- [8] A. Agarwal *et al.*, "A 320 mV-to-1.2 V on-die fine-grained reconfigurable fabric for DSP/media accelerators in 32 nm CMOS," in *Proc. ISSCC*, Feb. 2010, pp. 328–330.
- [9] A. Agarwal et al., "A 32 nm 8.3 GHz 64-entry × 32b variation tolerant near-threshold voltage register file," in Proc. 2010 Symposium on VLSI Circuits, June 2010, pp. 105–106.
- [10] H. McIntyre et al., "Design of the two-core x86–64 AMD "Bulldozer" module in 32 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 1–13, Jan. 2012.
- [11] Y. Pu, J. Pineda de Gyvez, H. Corporaal, and Y. Ha, "An ultra lowenergy/frame multi-standard JPEG co-processor in 65 nm CMOS with sub/near threshold power supply," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 668–680, Mar. 2010.
- [12] J. T. Kao and A. P. Chandrakasan, "Dual-threshold voltage techniques for low power digital circuits," *IEEE J. Solid-State Circuits*, vol. 35, no. 7, pp. 1009–1018, Jul. 2000.
- [13] N. Gong, J. Wang, and R. Sridhar, "PVT variations aware optimal sleep vector determination of dual V<sub>t</sub> domino OR circuits," in *Proc. SOCC*, Sep. 2011, pp. 359–364.
- [14] A. R. Patwary *et al.*, "Bit-line organization in register files for low-power and high-performance applications," in *Proc. ICECE*, Sep. 2006, pp. 505–508.
- [15] T. M. Jones *et al.*, "Exploring the limits of early register release: Exploiting compiler analysis," *ACM Trans. ArchitectureCode Optimization*, vol. 6, no. 3, pp. 12–29, Sep. 2009.
- [16] D. Kanter, Intel's Sandy Bridge Microarchitecture 2010 [Online]. Available: http://www.realworldtech.com/, [Online]. Available:
- [17] K. Flautner et al., "Automatic performance setting for dynamic voltage scaling," J. Wireless Netw., vol. 8, no. 5, pp. 260–271, 2001.
- [18] N. Gong et al., "Clock-biased local bit line for high performance register files," *Electron. Lett.*, vol. 48, no. 18, pp. 1104–1105, Aug. 2012.
- [19] V. Kursun and E. G. Friedman, "Domino logic with variable threshold voltage keeper," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 6, pp. 1080–1093, Dec. 2003.
- [20] MOSIS Deep Design Rules [Online]. Available: http://www.mosis. com/ [Online]. Available:
- [21] Z. Liu and V. Kursun, "PMOS-only sleep switch dual-threshold voltage domino logic in sub-65-nm CMOS technologies," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 12, pp. 1311–1319, Dec. 2007.
- [22] V. Kursun and E. G. Friedman, "Sleep switch dual threshold voltage domino logic with reduced standby leakage current," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 5, pp. 485–496, May 2004.
- [23] L. Jin et al., "Reduce register files leakage through discharging cells," in Proc. IEEE ICCD, Oct. 2006, pp. 114–119.
- [24] Intel Inc., Intel® E7500 Chipset Datasheet 2002 [Online]. Available: http://www.intel.com/, [Online]. Available:
- [25] Y. Wang *et al.*, "Temperature-aware NBTI modeling and the impact of input vector control on performance degradation," in *Proc. DATE*, Apr. 2007, pp. 546–551.
- [26] N. Gong, B. Guo, J. Lou, and J. Wang, "Analysis and optimization of leakage current characteristics in sub-65 nm dual Vt footed domino circuits," *Microelectron. J.*, vol. 39, no. 9, pp. 1149–1155, Sept. 2008.
- [27] G. Yang, Z. Wang, and S. Kang, "Leakage-proof domino circuit design for deep sub-100 nm technologies," in *Proc. IEEE Int. Conf. on VLSI Design*, Jan. 2004, pp. 222–227.
  [28] Z. Liu and V. Kursun, "Leakage power characteristics of dynamic cir-
- [28] Z. Liu and V. Kursun, "Leakage power characteristics of dynamic circuits in nanometer CMOS technologies," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 8, pp. 692–696, Aug. 2006.
- [29] J. Wang and W. Wu, "Using charge self-compensation domino fulladder with multiple supply and dual threshold voltage in 45 nm," in *Proc. ULIS*, Mar. 2009, pp. 225–228.
- [30] N. Gong and R. Sridhar, "Optimization and predication of leakage current characteristics in wide domino OR gates under PVT variation," in *Proc. SOCC*, Sept. 2010, pp. 19–24.

- [31] Predictive Technology Model (PTM) [Online]. Available: [Online]. Available: http://www.eas.asu.edu/~ptm
- [32] ITRS, 2009/2010 [Online]. Available: http://www.itrs.net, [Online]. Available:
- [33] M. Alioto et al., "Understanding the effect of process variations on the delay of static and domino logic," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 18, no. 5, pp. 697–710, May 2010.
- [34] M. H. Kaffashian *et al.*, "An optimization method for NBTI-aware design of domino logic circuits in nanoscale CMOS," *IEICE Electron. Express*, vol. 8, no. 17, pp. 1406–1411, Jul. 2011.
- [35] K. Kuhn et al., "Managing process variation in intel's 45 nm CMOS technology," *Intel Technol. J.*, vol. 12, no. 2, pp. 93–109, Jun. 2008.
- [36] S. K. Saha, "Modeling process variability in scaled CMOS technology," *IEEE Design Test Comput.*, vol. 99, no. 1, pp. 8–16, Mar. 2010.
- [37] R. Fernandes and R. Vemuri, "Accurate estimation of vector dependent leakage power in the presence of process variations," in *Proc. ICCD*, Oct. 2009, pp. 451–458.
- [38] S. P. Mohanty et al., "A comparative analysis of gate leakage and performance of high-K nanoscale CMOS logic gates," in Proc. 16th ACM/IEEE Int. Workshop Logic Synthesis, May 2007, pp. 31–38.
- [39] H. F. Dadgour and K. Banerjee, "A novel variation-tolerant keeper architecture for high-performance low-power wide fan-in dynamic OR gates," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 18, no. 11, pp. 1567–1577, Nov. 2010.
- [40] M. Meterelliyoz and P. Song, "Characterization of random process variations using ultralow-power, high-sensitivity, bias-free sub-threshold process sensor," *IEEE Trans. Circuits Syst. 1*, vol. 57, no. 8, pp. 1838–1847, Aug. 2010.
- [41] K. Chakraborty and S. Roy, "Rethinking threshold voltage assignment in 3D multicore designs," in *Proc. 23rd Int. Conf. VLSI Design*, Jan. 2010, pp. 375–380.
- [42] W. Liao et al., "Temperature and supply voltage aware performance and power modeling at microarchitecture level," *IEEE Trans. Computer-Aided Design Integrated Circuits Syst.*, vol. 24, no. 7, pp. 1042–1053, Jul. 2005.
- [43] S. Han et al., "On the difference of temperature dependence of metal gate and poly gate SOI MOSFET threshold voltages," in *IEDM Tech. Dig.*, Dec. 2008, pp. 1–4.
- [44] M. Q. Do et al., Capturing Process-Voltage-Temperature (PVT) Variations in Architectural Static Power Modeling for SRAM Arrays Chalmers Univ, Technol., Sweden, Technical report No. 2007-06, 2007.
- [45] S. Ghosh and K. Roy, "Parameter variation tolerance and error resiliency: New design paradigm for the nanoscale era," *Proc. IEEE*, vol. 98, no. 10, pp. 1719–1751, Oct. 2010.
- [46] K. Gulati and N. Jayakumar, "A probabilistic method to determine the minimum sleep vector for combinational designs in the presence of random PVT variations," *Integration, VLSI J.*, vol. 41, no. 3, pp. 399–412, May 2008.
- [47] B. Li et al., "Impact of process and temperature variations on network-on-chip design exploration," in Proc. Second ACM/IEEE Int. Symp. Networks-on-Chip, Sept. 2008, pp. 117–126.
- [48] Hewlett-Packard Company, CACTI5. Palo Alto, CA [Online]. Available: http://quid.hpl.hp.com:9081/cacti, [Online]. Available:
- [49] C. Kuo et al., "Fast statistical analysis of process variation effects using accurate PLL behavioral models," *IEEE Trans. Circuits Syst. I*, vol. 56, no. 6, pp. 1160–1172, June 2009.



**Na Gong** (M'13) received the B.E. degree in electrical engineering, the M.E. degree in microelectronics from Hebei University, Hebei, China, and the Ph.D. degree in computer science and engineering from the State University of New York, Buffalo, in 2004, 2007, and 2013, respectively.

Currently, she is an Assistant Professor of Electrical and Computer Engineering at the North Dakota State University, Fargo, ND, USA. Her research interests include device-circuit-architecture co-design for nanoscale VLSI circuit and system, power effi-

cient and reliable electronics for mobile computing and high performance computing, and emerging memory technologies in computer systems.



Jinhui Wang (M'13) received the B.E. degree in electrical engineering from Hebei University, Hebei, China, in 2004, and the Ph.D. degree in microelectronics and solid-state electronics from Beijing University of Technology, Beijing, China, in 2010. From July 2009 to June 2010, he was a Joint-educated Ph.D. Student with the High-Performance VLSI/IC Design and Analysis Laboratory, University of Rochester, Rochester, NY, USA, where he was responsible for the 3-D VLSI design and 3-D chip thermal issue solution.

He has been an Assistant Professor with the College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China, since July 2010. His current research interests include low-power, high-performance, and variation-tolerant integrated circuit design, 3-D IC and EDA methodologies, and thermal issue solution in VLSI. He has more than 70 publications and 6 patents in the emerging semiconductor technologies.

Dr Wang serves or has served as a technical program committee member of a number of conferences in VLSI, including SoCC, ICSICT.



**Ramalingam Sridhar** (M'82–SM'99) received the B.E. (Honors) degree in electrical and electronics engineering from Guindy Engineering College, University of Madras, India, and the M.S. and Ph.D. degrees in electrical and computer engineering from Washington State University, Pullman.

Since 1987 he has been with the State University of New York, Buffalo, where he is an Associate Professor in the Department of Computer Science and Engineering. His research interests are in VLSI circuits, systems and architecture, variability, power

aware and robust design, very deep submicrometer systems, deep submicrometer VLSI systems, clocking and synchronization, memory circuits and architecture, wireless and sensor network security, secure architectures, and power aware security solutions for embedded systems.

Prof. Sridhar was an IEEE CAS Distinguished Lecturer. He has served as Program Chair and General Chair of ASIC and SoC Conferences and has served on the editorial board of many journals including IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART I: REGULAR PAPERS and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, and technical committees of numerous conferences in wireless systems and VLSI.