# Low Energy Digital Circuits in Advanced Nanometer Technologies 

Tesis presentada a la Facultad de Ingeniería de la Universidad de la República por<br>Francisco Veirano<br>EN CUMPLIMIENTO PARCIAL DE LOS REQUERIMIENTOS<br>PARA LA OBTENCIÓN DEL TÍTULO DE<br>Doctor en Ingeniería Eléctrica.

## Directores de Tesis

Fernando Silveira ........... Universidad de la República, Uruguay Lirida Naviner Telecom ParisTech, France

## Tribunal

Denis Flandre (Revisor Externo). Université catholique de Louvain, Belgium
Ricardo Reis (Revisor Externo) ....... . Universidade Federal do Rio Grande do Sul, Brazil
Juan Pablo Oliver............ Universidad de la República, Uruguay
Fernando Silveira ............ Universidad de la República, Uruguay
Lirida Naviner
Telecom ParisTech, France

Director Académico
Fernando Silveira ........... Universidad de la República, Uruguay

Low Energy Digital Circuits in Advanced Nanometer Technologies, Francisco Veirano.
ISSN 1688-2784

Esta tesis fue preparada en $\mathrm{EAT}_{\mathrm{E}} \mathrm{X}$ usando la clase iietesis (v1.1).
Contiene un total de 128 páginas.
Compilada el 20-08-2019.
http://iie.fing.edu.uy/

The world must not only be interpreted, it must be transformed. Man ceases to be the slave and tool of his environment and converts himself into the architect of his own destiny.

Ernesto "Che" Guevara

This page was intentionally left blank.

## Resumen

La constante demanda de dispositivos portables y los avances hacia la Internet de las Cosas han hecho del consumo de energía uno de los mayores desafíos y preocupación en la industria y la academia. La forma más eficiente de reducir el consumo de energía de los circuitos digitales es reduciendo su voltaje de alimentación ya que la energía dinámica depende de manera cuadrática con dicho voltaje. Varios trabajos demostraron que existe un voltaje de alimentación óptimo, que minimiza la energía consumida para realizar cierta operación en un circuito digital, llamado punto de mínima energía. Este óptimo voltaje se encuentra usualmente entre 200 mV y 400 mV dependiendo del circuito y de la tecnología utilizada. Para obtener estos voltajes de alimentación de la fuente de energía, se necesitan conversores dc-dc integrados con alta eficiencia.

Esta tesis se concentra en el estudio de sistemas digitales trabajando en la región sub umbral diseñados en tecnologías nanométricas avanzadas ( 28 nm ). Estos sistemas se pueden dividir usualmente en dos bloques, uno llamado bloque de manejo de potencia, y el segundo, el circuito digital operando en la región sub umbral.

En particular, en lo que corresponde al bloque de manejo de potencia, el circuito más crítico es en general el conversor dc-dc. Este circuito convierte el voltaje de una batería (o super capacitor o link de transferencia inalámbrica de energía o unidad de cosechado de energía) en un voltaje entre 200 mV y 400 mV para alimentar el circuito digital en su voltaje óptimo. En esta tesis desarrollamos dos técnicas que, mediante el reciclado de carga, mejoran la eficiencia de los conversores dc-dc a capacitores conmutados. La primera es basada en un técnica utilizada en circuitos adiabáticos que se llama carga gradual o a pasos. Esta técnica se ha utilizado en circuitos y aplicaciones en donde el consumo por la carga y descarga de una capacidad grande es dominante. Nosotros analizamos la posibilidad de utilizar esta técnica en conversores dc-dc a capacitores conmutados con capacitores integrados. Se demostró a través de medidas que se puede reducir en un $29 \%$ el consumo debido al encendido y apagado de las llaves que implementan el conversor dc-dc. La segunda técnica, es una simplificación de la primera la cual puede ser aplicada en ciertas arquitecturas de conversores dc-dc a capacitores conmutados. También se fabricó y midió un conversor con esta técnica y se obtuvo una reducción del $25 \%$
en la energía consumida por el manejo de las llaves del conversor.
Por otro lado, estudiamos los circuitos digitales operando en la región sub umbral y en particular cerca del punto de mínima energía. Estudiamos diferentes modelos para circuitos operando en estas condiciones y los mejoramos considerando las diferencias entre los transistores NMOS y PMOS. Mediante este modelo demostramos que existe un óptimo en la relación entre las corrientes de fuga de ambos transistores que minimiza la energía de fuga consumida por operación. Este óptimo depende de la arquitectura del circuito digital y además de los datos de entrada de circuito. Sin embargo, demostramos que se puede reducir el consumo de manera considerable al operar en un óptimo promedio.

Propusimos dos técnicas para alcanzar la relación óptima. Utilizamos una tecnología FD-SOI de 28 nm para la mayoría de las simulaciones, pero también mostramos que estas técnicas pueden ser utilizadas en tecnologías bulk convencionales. La primer técnica, consiste en utilizar el voltaje de la puerta trasera (o sustrato en CMOS convencional) para ajustar de manera independiente las corrientes del NMOS y PMOS para que el circuito trabaje en el óptimo de la relación de corrientes. Esta técnica la llamamos polarización de voltaje de puerta trasera óptimo. La segunda técnica, consiste en utilizar los largos de los transistores para ajustar las corrientes de fugas de cada transistor y obtener la relación óptima. Trabajando en la región sub umbral y en tecnologías avanzadas, incrementar moderadamente el largo del transistor tiene poco impacto en la energía dinámica y es por eso que se puede utilizar.

Finalmente, utilizamos estas técnicas en circuitos básicos como sumadores y mostramos que se puede obtener una reducción de la energía consumida de aproximadamente $50 \%$, en un amplio rango de frecuencias, mientras estos circuitos trabajan cerca del punto de energía mínima.

Las principales contribuciones de la tesis son:

- Análisis de la técnica de carga gradual o a pasos en capacidades pequeñas.
- Implementación de la técnica de carga gradual para la mejora de eficiencia de conversores dc-dc a capacitores conmutados.
- Simplificación de la técnica de carga gradual para mejora de la eficiencia en algunas arquitecturas de conversores dc-dc de capacitores conmutados.
- Análisis del mínimo voltaje de operación en circuitos digitales debido al ruido intrínseco del dispositivo y el impacto del escalado de las tecnologías en el mismo.
- Mejoras en el modelado del punto de energía mínima de operación de un circuito digital en el cual se consideran las diferencias entre el transistor PMOS y NMOS.
- Demostración de la existencia de un óptimo en la relación entre las corrientes de fuga entre el NMOS y PMOS que minimiza la energía de fugas consumida en la región sub umbral.
- Desarrollo de una estrategia de polarización del voltaje de puerta trasera para que el circuito digital trabaje en el óptimo antes mencionado.
- Desarrollo de una estrategia para el dimensionado de los transistores que componen las compuertas digitales que permite al circuito digital operar en el óptimo antes mencionado.
- Análisis del impacto de la arquitectura del circuito y de los datos de entrada del mismo en el óptimo antes mencionado.

Esta tesis se basa en las publicaciones [1-8]. Adicionalmente, durante el doctorado se generaron otras publicaciones [9-16] las cuales se encuentran relacionadas con el tema de tesis pero no fueron incluidas en la misma.

This page was intentionally left blank.

## Abstract

The demand for portable devices and the continuing trend towards the Internet of Things (IoT) have made of energy consumption one of the main concerns in the industry and researchers. The most efficient way of reducing the energy consumption of digital circuits is decreasing the supply voltage ( $V_{d d}$ ) since the dynamic energy quadratically depends on $V_{d d}$. Several works have shown that an optimum supply voltage exists that minimizes the energy consumption of digital circuits. This optimum supply voltage is usually around 200 mV and 400 mV depending on the circuit and technology used. To obtain these low supply voltages, on-chip dc-dc converters with high efficiency are needed.

This thesis focuses on the study of subthreshold digital systems in advanced nanometer technologies. These systems usually can be divided into a Power Management Unit (PMU) and a digital circuit operating at the subthreshold regime.

In particular, while considering the PMU, one of the key circuits is the dc-dc converter. This block converts the voltage from the power source (battery, super capacitor or wireless power transfer link) to a voltage between 200 mV and 400 mV in order to power the digital circuit. In this thesis, we developed two charge recycling techniques in order to improve the efficiency of switched capacitors dc-dc converters. The first one is based on a technique used in adiabatic circuits called stepwise charging. This technique was used in circuits and applications where the switching consumption of a big capacitance is very important. We analyzed the possibility of using this technique in switched capacitor dc-dc converters with integrated capacitors. We showed through measurements that a $29 \%$ reduction in the gate drive losses can be obtained with this technique. The second one is a simplification of stepwise charging which can be applied in some architectures of switched capacitors dc-dc converters. We also fabricated and tested a dc-dc converter with this technique and obtained a $25 \%$ energy reduction in the driving of the switches that implement the converter.

Furthermore, we studied the digital circuit working in the subthreshold regime, in particular, operating at the minimum energy point. We studied different models for circuits working in these conditions and improved them by considering the differences between the NMOS and PMOS transistors. We obtained an optimum NMOS/PMOS leakage current imbalance that minimizes the total leakage energy
per operation. This optimum depends on the architecture of the digital circuit and the input data. However, we also showed that important energy reductions can be obtained by operating at a mean optimum imbalance.

We proposed two techniques to achieve the optimum imbalance. We used a Fully Depleted Silicon on Insulator (FD-SOI) 28 nm technology for most of the simulations, but we also show that these techniques can be applied in traditional bulk CMOS technologies. The first one consists in using the back plane voltage of the transistors (or bulk voltage in traditional CMOS) to adjust independently the leakage current of the NMOS and PMOS transistor to work under the optimum NMOS/PMOS leakage current imbalance. We called this approach the Optimum Back Plane Biasing (OBB). A second technique consists of using the length of the transistors to adjust this leakage current imbalance. In the subthreshold regime and in advanced nanometer technologies a moderate increase in the length has little impact in the output capacitance of the gates and thus in the dynamic energy. We called this approach an Asymmetric Length Biasing (ALB). Finally, we use these techniques in some basic circuits such as adders. We show that around $50 \%$ energy reduction can be obtained, in a wide range of frequency while working near the minimum energy point and using these techniques.

The main contributions of this thesis are:

- Analysis of the stepwise charging technique in small capacitances.
- Implementation of stepwise charging technique as a charge recycling technique for efficiency improvement in switched capacitor dc-dc converters.
- Development of a charge sharing technique for efficiency improvement in switched capacitor dc-dc converters.
- Analysis of minimum operating voltage of digital circuits due to intrinsic noise and the impact of technology scaling in this minimum.
- Improvement in the modeling of the minimum energy point while considering NMOS and PMOS transistors difference.
- Demonstration of the existence of an optimum leakage current imbalance between the NMOS and PMOS transistors that minimizes energy consumption in the subthreshold regiion.
- Development of a back plane (bulk) voltage strategy for working in this optimum.
- Development of a sizing strategy for working in the aforementioned optimum.
- Analysis of the impact of architecture and input data on the optimum imbalance.

The thesis is based on the publications [1-8]. During the Ph.D. program, other publications were generated [9-16] that are partially related with the thesis but were not included in it.

This page was intentionally left blank.

## Contents

Resumen ..... iii
Abstract ..... vii
1 Introduction ..... 1
1.1 Ultra Low Power (ULP) digital systems ..... 1
1.1.1 Power Management Unit (PMU) ..... 4
1.1.2 Sub/near threshold digital circuit ..... 5
1.2 Thesis outline ..... 6
Contents ..... 1
2 DC-DC Converters ..... 9
2.1 Introduction ..... 9
2.1.1 Charge recycling ..... 11
2.2 Analysis of stepwise charging ..... 14
2.2.1 Energy consumption ..... 15
2.2.2 Logic ..... 18
2.2.3 Limits of the stepwise charging technique ..... 20
2.3 Stepwise charging in SC dc-dc converters ..... 25
2.4 Analysis of charge sharing ..... 28
2.5 Charge sharing in SC dc-dc converters ..... 32
2.6 Conclusions ..... 34
3 Subthreshold digital circuits modelling ..... 37
3.1 Minimum operating voltage in subthreshold digital logic ..... 37
3.1.1 Model and method ..... 38
3.1.2 Simulation results ..... 41
3.2 Minimum Energy Point (MEP) model ..... 50
3.2.1 Classic MEP model ..... 51
3.2.2 Improved MEP model ..... 52
3.3 Conclusions ..... 55

## Contents

4 Subthreshold energy reduction techniques ..... 57
4.128 nm Ultra-Thin Body and Box (UTBB) Fully Depleted Silicon on Insulator (FD-SOI) technology ..... 58
4.2 Optimum Back Plane Biasing (OBB) ..... 59
4.2.1 Back Plane Biasing Schemes ..... 59
4.2.2 Simulation results ..... 60
4.3 Asymmetric Length Biasing (ALB) ..... 67
4.3.1 Simulation results ..... 68
4.4 Comparison OBB and ALB ..... 75
4.5 Ripple Carry Adder (RCA) simulations ..... 78
4.6 Conclusions ..... 81
5 Conclusions ..... 83
5.1 Thesis contributions ..... 84
5.2 Future work ..... 85
5.3 Publications associated with the thesis ..... 86
5.3.1 Journals ..... 86
5.3.2 Conferences ..... 86
5.4 Publications non associated with the thesis ..... 87
5.4.1 Journals ..... 87
5.4.2 Conferences ..... 87
A Convergences of auxiliary capacitors voltage in stepwise charging ..... 89
A. 1 Charge transfer between capacitors ..... 89
A. 2 Convergence in two auxiliary capacitors ..... 90
Bibliography ..... 93
Glossary ..... 103
Acronyms ..... 105
List of Tables ..... 108
List of Figures ..... 113

## Chapter 1

## Introduction

### 1.1 Ultra Low Power (ULP) digital systems

Electronic devices are a natural part of our everyday life. This is thanks to two important discoveries made in the mid 20th century. The first one, the transistor which is the most important device in electronic circuits nowadays. The second was the possibility of integrating several transistors in a slice of semiconductor material in the so-called Integrated Circuits (ICs). These two inventions have revolutionized the electronics industry and our daily life. Since the '60s the IC industry has been following the well known Moore's Law which states that the number of transistors that can be integrated per unit area doubles every two years [17].

This revolution has enabled us to develop incredible applications such as miniaturized wireless communication, signal processing, personal computers, active implantable medical devices, Radio Frequency Identification (RFID), smartphones among several others. These applications have very different constraints, for example, in terms of performance, power and energy consumption and robustness which is why there are several ways of categorizing them. An interesting categorization is from the point of view of the power source that they use to function. There are devices where their power source has a limited amount of energy (batteries or supercapacitors) or power (harvesters) and there are devices which need to be connected to the electrical grid. Since the end of the '90s, the demand for portable, battery-powered devices has grown dramatically. This is one of the main reasons why the power and energy consumption has gained special attention in the industry and researchers.

Another subcategorization of the electronic devices is regarding the levels of power or energy consumed. For example, when the power consumed by the circuit is between nW and tens of mW , they are usually referred to as Ultra Low Power (ULP) applications. In this kind of applications we can find several examples. One that has become a hot topic in the recent years is Internet of Things (IoT).

## Chapter 1. Introduction

The concept of IoT is basically an extension of the well-known Wireless Sensor Networks (WSNs), where several small, usually autonomous devices with sensing capabilities, are connected to a network and in some cases to the Internet. These technologies can be used in agriculture, smart homes, health monitoring and more that are continuously being developed. ULP applications also include most of Active Implantable Medical Devices (AIMDs) or RFID tags.

In ULP systems energy and power consumption is one of the main concerns. Additionally, other desired characteristics are small size and robustness. On the other hand, throughput constraint is usually less than $10 \mathrm{Mops} / \mathrm{s}$ which is suitable for many low-power techniques. Due to this, usually ULP systems work in a duty cycled manner, see Fig. 1.1. This means that the system has to perform a task (active mode) periodically (with period $T_{\text {Task }}$ ) and if it finishes before $T_{\text {Task }}$, turns into a sleep mode where leakage power is minimized. Usually, techniques such as clock gating [18], voltage scaling or power gating with multi-threshold CMOS [19] are used to reduce leakage in this sleep mode.


Figure 1.1: Duty Cycled ULP Systems

Under these conditions, the average power consumed by the system can be expressed as Eq. (1.1). Additionally, considering that $P_{\text {Sleep }} \ll P_{\text {Act }}$, the approximation used in Eq. (1.1) is valid. The parameters in Eq. (1.1) are the ones shown in Fig. 1.1. $P_{A v}$ is total average power consumed by duty cycled ULP systems. $T_{\text {Act }}$ is the time in active mode. $T_{\text {Task }}$ is the time available to do the specific task. $P_{\text {Act }}$ is the average power in active mode. $P_{\text {Sleep }}$ is the average power in sleep mode. And $E_{\text {Act }}$ is the energy consumed to do the task in active mode.

$$
\begin{equation*}
P_{A v}=\frac{T_{\text {Act }}}{T_{\text {Task }}}\left(P_{\text {Act }}-P_{\text {Sleep }}\right)+P_{\text {Sleep }} \approx \frac{E_{\text {Act }}}{T_{\text {Task }}}+P_{\text {Sleep }} \tag{1.1}
\end{equation*}
$$

In order to minimize the average power or the total energy per task, depending on the $T_{\text {Task }}$ imposed by the application, we need to focus on minimizing $E_{A c t}$, $P_{\text {Sleep }}$ or both. Additionally, the interaction of techniques used to reduce $P_{\text {Sleep }}$ and $E_{\text {Act }}$ need to be considered in the overall energy minimization. For example,

### 1.1. Ultra Low Power (ULP) digital systems

in [20] the energy consumed due to the transition between an active mode and sleep mode is studied and minimized. In [21] the authors analyzed the design of the cut-off structures for power gating considering an ultra-low-voltage CMOS circuit in the active mode.

The most efficient way of reducing $E_{\text {Act }}$ in digital circuits is decreasing the supply voltage $\left(V_{d d}\right)$ since the dynamic energy quadratically depends on $V_{d d}$. However, lowering $V_{d d}$ also impacts the speed of the circuit, making it slower. This increase in the delay increases the leakage energy consumed during a specific operation, because, although the leakage power can be reduced due to the decrease in $V_{d d}$, the energy increases as the power is consumed in a much longer period of time. These opposite trends give rise to an optimal $V_{d d}$, usually in the sub threshold region $\left(V_{d d}<V_{T}\right)$ or near threshold region $\left(V_{d d} \approx V_{T}\right)(200 \mathrm{mV}$ to 400 mV$)$, where energy per operation is minimized [22]. This point is usually known as the Minimum Energy Point (MEP).

In conclusion, depending on the performance constraints, in ULP applications, the digital circuit should work near the MEP. If the performance requirements exceed the performance at the MEP, the $V_{d d}$ should be increased to increase the performance of the digital circuit and fulfill the operation at the desired time. On the other hand, if the energy due to the transition or implementation of the sleep mode jeopardizes the benefits obtained by using it, an optimum $V_{d d}$ lower than the one that achieves the MEP might be the optimum that minimizes Eq. (1.1). This is why in this Ph.D. thesis, we will focus on digital circuits working near the MEP.

Considering the discussion presented, Fig. 1.2 shows a very simplified block diagram of ULP systems.


Figure 1.2: Simplified block diagram of ULP systems

## Chapter 1. Introduction

### 1.1.1 Power Management Unit (PMU)

The Power Management (PM) block showed in Fig. 1.2 is in charge of managing the power supply to the digital circuit. For example, if the system has a power gating option, this block would be the one which controls the switches that turn off the digital circuit. It also might have an energy meter [16] to correctly estimate the charge left in the battery and adjust circuit performance or functionalities according. If the power is received through a Wireless Power Transfer Link (WPTL), the ac voltage needs to be rectified and adapted to power the digital circuit.

Despite of the power source used, a key circuit inside the Power Management Unit (PMU) is the dc-dc converter which provides the optimum $V_{d d}$ for the ULP digital circuit. If we consider that the power source, for example, a battery, has a voltage of around 1 V or more and that the optimum $V_{d d}$ is between 0.2 V and 0.4 V , a step-down dc-dc converter with a voltage conversion of approximately $1 / 3$ is needed. The most important characteristic of dc-dc converters is efficiency. This is defined in Eq. (1.2) where $P_{L}$ is the power delivered to the load and $P_{\text {Losses }}$ the total power losses in the converter. To take advantage of the energy benefits of working near the MEP, high-efficiency dc-dc converters are required.

$$
\begin{equation*}
\eta_{d c-d c}=\frac{P_{L}}{P_{L}+P_{\text {Losses }}} \tag{1.2}
\end{equation*}
$$

Additionally, as it was discussed, ULP systems usually have size constraints, this means that fully integrated dc-dc converters are preferred. When considering this, Switched Capacitors (SC) have shown to be the most efficient [23]. This is mainly because low-drop-out (LDO) regulators present low efficiencies at low output voltages and buck converters require inductors that cannot be integrated with high-quality factors. Thus, in this thesis, we focus on SC fully integrated step down dc-dc converters.

The most important losses in these converters are the conduction, gate drive, parasitic capacitance, and additional logic losses. There are works that try to reduce some of these losses to improve the converter efficiency. For example, some try to reduce conduction losses by reducing output ripple and implementing interleaved converters [24]. In [9] we proposed a charge recycling technique that reduces the losses due to top/bottom-plate parasitic capacitances. When a feedback loop is implemented to maintain the output voltage constant despite changes in the power consumed by the load, additional blocks are needed. To maintain high efficiency at low output power the blocks that implement this feedback loop need to consume very small quiescent current. In [10] we worked in an ultra-low power Current Controlled Oscillator (CCO) and in [13] in a high slew-rate ultra-low power Operational Transconductance Amplifier (OTA), which are both suitable for the aforementioned feedback loop. In this thesis, we focus on the reduction of the gate-drive losses in the switches that implement the SC dc-dc converter. We propose two novel techniques to reduce these losses and increase the efficiency of

### 1.1. Ultra Low Power (ULP) digital systems

the converter.

### 1.1.2 Sub/near threshold digital circuit

In the beginning of this century, the existence of the MEP was demonstrated [22]. Since then, the operation in this point was proposed for very low-speed ultra-low power applications because the performance achieved in the MEP was very low, in the order of the kHz [25-29]. However, the advanced nanometer technologies bring new opportunities since the performances obtained in the MEP can cover a much wider range of applications from the kHz to some tens of MHz .

Figure 1.3 shows the simulation of energy per operation and the maximum frequency of operation of a chain of 25 inverters with an activity factor of 1 im plemented with a 28 nm Fully Depleted Silicon on Insulator (FD-SOI) technology. The activity factor is the relationship between the effective capacitance that is switching during an operation and the total effective capacitance of the circuit. It can be seen that in this case the MEP is achieved with an optimal $V_{d d}$ of 200 mV while achieving a maximum frequency of around 3 MHz . The energy consumption of this point is more than one order of magnitude less than the traditional point of operation above the threshold (1 V).


Figure 1.3: Chain of 25 inverters. Activity factor 1.28 nm FD-SOI technology.
The main drawback of advanced technologies is that the variability is worsened and this has a great impact in subthreshold circuits because the drain current exponentially depends on the threshold voltage, supply voltage, and temperature [30-32]. In $[33,34]$ the authors present the main challenges of using advanced technologies for subthreshold digital circuits and try to answer the question of whether they are beneficial or not.

In particular, FD-SOI technology has emerged as a solution to attain digital circuits with high energy efficiency, mainly due to the high subthreshold slope

## Chapter 1. Introduction

and reduced parasitic capacitance [35-38]. The variability is greatly decreased, in comparison with bulk technology, since a lightly doped body is used, which makes it suitable for ultra-low voltage circuits. One of the main reasons for this is that while using lightly doped bodies the threshold voltage stops depending on the SOI film sickness. Moreover, when undoped channels are used, $V_{T}$ variations such as random dopant fluctuations disappear [39-41]. Finally, the solution proposed by [42] for multi-threshold voltage $\left(V_{T}\right)$ transistors in the so-called Ultra-Thin Body and Box (UTBB) FD-SOI, opens a new degree of freedom by allowing an ultra wide range of Back Plane Biasing (BPB) voltage which can be used for fine tuning the $V_{T}$ of the transistors.

In this thesis, we focused on the modeling and design of digital circuits working near the MEP. We mainly focus on the design of digital gates at devices level. We developed two techniques that can reduce the leakage energy near the MEP. In order to do that, we improved the classic model used in this point of operation. We used as main technology a 28 nm FD-SOI technology but the main results can be applied in traditional bulk technologies.

In some cases, either because the performance obtained in the MEP is less than the required or due to variability issues, the $V_{d d}$ is increased to overcome these limitations. Usually, this operation regime is called near threshold, since the $V_{d d}$ is a little higher than $V_{T}$. However, since the circuits are still working near the MEP, the techniques are still beneficial.

### 1.2 Thesis outline

In this section, the thesis outline is presented. In Chapter 2 the results obtained regarding the techniques for efficiency improvement in SC dc-dc converters are presented. First, in Section 2.1 a revision on the state of the art in charge recycling techniques is presented. Then, in Section 2.2 a charge recycling technique called stepwise charging is analyzed and the possibility of its application in SC dc-dc converters is studied. In Section 2.3 the technique is applied in a SC dc-dc converter and measurements results are shown [8]. Afterward, in Section 2.4 a simplification of stepwise charging is proposed that can be applied in some architectures of SC dc-dc converters. In Section 2.5 the technique is applied in a SC dc-dc converter and measurements results are shown [7].

In Chapter 3 the main contributions to the modeling of subthreshold digital circuits are presented. First, in Section 3.1 the minimum operating voltage of digital circuits is studied. In particular, the impact of intrinsic noise is analyzed and a minimum $V_{d d}$ due to this is proposed and compared with other definitions and the MEP voltage [5,6]. Then, in Section 3.2 the classic model of the MEP is presented. Further on, an improvement to this model is proposed where the differences between the NMOS device and PMOS device are taken into account

### 1.2. Thesis outline

and it can be seen that an optimum ratio between the NMOS and PMOS leakage currents exists, optimum imbalance [1-4].

In Chapter 4 the techniques developed during the thesis in order to operate close to the aforementioned optimum imbalance are presented. First, in Section 4.1, we present a brief introduction to the 28 nm FD-SOI technology. Then, in Section 4.2 a back plane voltage strategy to work in the optimum imbalance is shown. We refer to this technique as Optimum Back Plane Biasing (OBB) [2,3]. Then, in Section 4.3 a novel sizing strategy to work in the optimum imbalance is presented. It consists of sizing with a different length the PMOS and NMOS device and we refer to it as Asymmetric Length Biasing (ALB) [1,4]. Afterward, the application of these techniques in some basic circuits such as a Ripple Carry Adder (RCA) is discussed.

Finally, the main conclusions of this thesis are drawn in Chapter 5 and the list of publications produced during this thesis is shown.

It is useful to note that the acronyms and notation used throughout the thesis are defined in the glossary. Additionally, it is possible to click on these to see its definition. (e.g. MEP or $V_{d d}$ ).

This page was intentionally left blank.

## Chapter 2

## DC-DC Converters

In this chapter, we present two techniques developed for efficiency improvement of Switched Capacitors (SC) dc-dc converters. The techniques consist of the reduction of the energy necessary to drive the switches that implement the converter. In Section 2.1 we present a brief introduction to SC dc-dc converters and a general analysis of charge recycling techniques present in the literature. In Section 2.2 we study a particular charge recycling technique called stepwise charging. We analyze the benefits and limitations of the technique while considering a fully integrated approach. Further on, in Section 2.3, we demonstrate through measurements how we can improve the efficiency of SC dc-dc converters with this technique. Afterward, in Section 2.4, the second charge recycling technique is analyzed which can be seen as a simplification of stepwise charging. In Section 2.5, the implementation of this technique in SC dc-dc converters is shown. Finally, the chapter conclusions are drawn.

### 2.1 Introduction

One of the main functions of the Power Management (PM) block in Ultra Low Power (ULP) systems, is adjusting the voltage to deliver the power to the load through the optimum supply voltage. To do this, a high-efficiency dc-dc converter is needed. Due to the size constraint of ULP applications, fully integrated dcdc converters are preferred. Additionally, as it was analyzed in the first chapter, ULP systems usually work in a duty cycled manner. This is why the converter needs to be able to deliver very low output power while maintaining high-efficiency and to correctly power the digital circuit while it is in a sleep mode. Finally, in the complex SoC, having several power domains for different constrained block is important which might require several dc-dc converters in the same chip, another reason to prefer fully integrated versions.

While considering these constraints for the dc-dc converter, fully integrated

## Chapter 2. DC-DC Converters

SC dc-dc converters have shown to be the most efficient for the range of power needed in ULP systems [23]. In particular, low-drop-out (LDO) regulators present low efficiencies when the difference between the input and output voltages is big because the same current that is taken from the input is delivered to the output. On the other hand, inductive dc-dc converters can achieve very high-efficiency for high output power (tens of mW ) but cannot maintain this efficiency for low output power. Additionally, this type of converters require inductors that cannot be integrated with high-quality factors in a conventional CMOS process.

In the past decade, special attention has been paid to SC dc-dc converters since they can achieve very high-efficiency while delivering a wide range of output power. Several works study how to model and design these circuits to obtain optimized dc-dc converters with better power efficiencies [43,44]. There are also works that implement SoC in which they integrate a SC dc-dc converter and a microcontroller in a single chip $[35,45]$.

These converters use capacitors in different configurations to take power from the power source and then deliver it to the load. To shift between the different configurations, MOS switches are used. Each of these configurations is called a phase which are periodically turned on and off to implement the desired conversion. These converters operate at a high frequency (typically around 10 MHz ) so that the voltage in the capacitors that implement the converter is kept almost constant between the different phases. For example, Fig. 2.1 shows a series-parallel SC dc-dc converter with a conversion ratio of $1 / 3$. This means that the input voltage is divided by three at the output. In this case, the converter is implemented with three capacitors and two phases. In the first one, all the capacitors are connected in series with the power source. Each capacitor is charged to a voltage of $1 / 3$ of the voltage of the power source and energy is taken from this. In the second phase, all the capacitors are connected in parallel and energy is delivered to the load. The converter rotates between these two phases turning on and off the different MOS switches. The efficiency of these converters is defined by Eq. (2.1) where $P_{L}$ is the power delivered to the load and $P_{\text {Losses }}$ are the total power losses.

$$
\begin{equation*}
\eta_{d c-d c}=\frac{P_{L}}{P_{L}+P_{\text {Losses }}} \tag{2.1}
\end{equation*}
$$

The losses can be divided into conduction, gate drive, parasitic capacitance, and additional logic losses. The conduction losses are due to the energy dissipated while transferring charge from or to a capacitor. One way of minimizing these losses is reducing output ripple by implementing interleaved converters [24].

The implementation of integrated capacitors has the disadvantage of including top/bottom-plate parasitic capacitances which are usually charged and discharged dissipating extra energy. In [9] we proposed a charge recycling technique to reuse this charge to charge another parasitic capacitance instead of dissipating it.

Another source of energy loss is the additional circuits needed to implement


Figure 2.1: Series-Parallel step down converter.
the converter. Usually, when these converters want to maintain the output voltage despite the power demanded by the load, a feedback loop is needed to adjust the frequency of operation (and thus the output resistance [43]). This feedback loop can be implemented with an Operational Transconductance Amplifier (OTA) which converts the difference between the output voltage and the target one to a current and a Current Controlled Oscillator (CCO) which converts this current into an oscillation with a frequency proportional to the input current. To maintain high efficiencies under low output power, the quiescent power consumption of these blocks needs to be very low $[10,11,13]$.

Finally, the gate-drive losses are the energy necessary to turn on and off the MOS switches that implement the phases of the converter. In this thesis, we developed two charge recycling techniques to reduce these losses. In this chapter, the two techniques are analyzed and the implementations of them in two SC dc-dc converters are shown $[7,8]$.

### 2.1.1 Charge recycling

From a general point of view, a significant part of the energy dissipation in Integrated Circuits (ICs) is due to capacitances $(C)$ that are periodically charged to $V_{d d}$ and discharged to $g n d$. This is the case of the aforementioned gate drive losses and parasitic capacitance losses. When they are charged to $V_{d d}$, a charge $C . V_{d d}$ is taken from the $V_{d d}$ power supply, thus consuming $C . V_{d d}{ }^{2}$ energy and storing $1 / 2 C \cdot V_{d d}{ }^{2}$ in $C$. When $C$ is discharged to $g n d$, the stored energy is lost (dissi-

## Chapter 2. DC-DC Converters

pated in the discharging circuit). Several techniques have been proposed to reduce these energy losses.

A general and well-known technique to reduce energy consumption during the charge of a capacitance is adiabatic switching where energy consumption is decreased while increasing switching time [46-50]. The ideal case of this technique charges the capacitance with a constant current source and the energy dissipation during the charging process can be calculated with Eq. (2.2).

$$
\begin{equation*}
E_{\text {diss }}=\frac{R C}{T} C V^{2} \tag{2.2}
\end{equation*}
$$

In this case, $R$ is the resistance through which the capacitance $C$ is charged, $T$ is the duration of the charging process and $V$ is the final voltage in the capacitance. This constant current is associated with a linear voltage ramp power supply. Theoretically, through adiabatic switching, energy dissipation can be arbitrarily reduced by extending charging time.

Most implementations of adiabatic switching use a power supply with resonant inductors circuits to approximate the constant current and linear voltage ramp with sinusoidal signals. The main disadvantage of such solutions is the need for integrated inductors, which occupy a large area and have low Q. Furthermore, accurate time control pulses, with a duration that depends on the value of the capacitance $C$, are required.

An alternative solution called stepwise charging approximates the voltage ramp with a discrete number of intermediate voltage supplies [51,52]. References [53,54] proposed using a DC-DC converter to implement these intermediate voltages. In both solutions, off-chip components are needed and the complexity of the technique is increased. On the other hand, in [51] is observed that if a complete chargedischarge process is considered, the intermediate voltage sources do not contribute with any effective charge and can be substituted by capacitors. The architecture is presented in Fig. 2.2 where the connection between these auxiliary capacitors $\left(C a u x_{i}\right)$ and the main capacitance $(C)$ is shown.

Stepwise charging (either with multiple auxiliary capacitors or multiple supply voltages) has been proposed to be used in applications such as the drive of PAD capacitances [55], internal memory capacitances [56], LCD displays [57,58], clock line charging [59], inject printer headers [60] and dynamic voltage scaling [61,62]. There is one patent that proposes to use it in a buck dc-dc converter [63], but there is no analysis on when it is possible to use it.

Stepwise charging can be seen as stocking on auxiliary capacitors part of the charge and hence the energy, that would otherwise be dumped each time the capacitance $C$ is discharged. During the charging process, a majority of the charge is taken from these auxiliary storage capacitors instead of the $V_{d d}$ power supply. Thus, the amount of charge taken from the power supply is reduced and energy is saved due to the capacitor charge recycling. From the point of view of how


Figure 2.2: Architecture of stepwise charging.
energy is spent in the circuit, the energy loss in each step is decreased due to the reduction in the voltage drop between the source and the destination of the charge. In [51] this technique is applied for cases where the capacitance $C$ is large. Accordingly, the impact of the power consumption of the auxiliary logic is not considered, nor the restriction imposed by the minimum allowed size of switches needed to implement the recycling technique.

There are also works that use one single tank capacitor which in the end is a particular case of stepwise charging. In [64] a single capacitor is used as a tank capacitor between charges and discharges of word lines in ROM memories. In [65] this idea is used in the routing consumption of FPGAs, the unused lines of the FPGA are used as the tank capacitor for the charge recycling technique. In [66] the same idea is used to reduce energy consumption in stacked CPUs.

Finally, the most simple approach to reduce power consumption of a capacitance that needs to be charged and discharged periodically consists of sharing the charge between two capacitors. Instead of discharging a capacitor to gnd, a connection is made to another one that simultaneously requires charge, hence the charge is reused and the energy consumption is reduced. For example in [64] they use it in predecoder lines of ROM memories and in [67] in power gated digital circuits. We will call this the Charge Sharing (CS) technique.

In the particular case of SC dc-dc converters, there are some works that try to reduce the losses due to the charge/discharge of parasitic capacitances. In [68] the use of LC tanks to temporarily store the energy consumed by the gate drive circuit of the switches that implement the converter is proposed. The main disadvantage is that integrated inductors are needed. Other works, for example, [69] and [70] proposed to reuse the charge spent during the driving of the switches by delivering it to the load.

## Chapter 2. DC-DC Converters

In this thesis, we present a theoretical analysis of the stepwise charging technique and charge sharing technique. In particular, we explore the limits of these techniques in terms of the savings that can be obtained as a function of the capacitances value and frequency of charge/discharge of it. We consider the case of a fully integrated implementation of the techniques and consider practical limitations such as the consumption of additional logic needed to implement the techniques. These issues cannot be neglected when the capacitance to be charged and discharged is small. Furthermore, an approach for the design of the auxiliary logic is proposed, which makes that the application of these techniques may be embedded on the driver without requiring changes in the command of them (thus, it is "transparent" for the designer).

Finally, we demonstrate through measurements that these techniques can be used to reduce the gate-drive losses in SC dc-dc converters and thus, improve the efficiency of these converters.

### 2.2 Analysis of stepwise charging

To determine the benefits of implementing the stepwise charging technique, where a capacitance $C$ needs to be periodically charged and discharged at a frequency $f$, two things need to be known. First, the energy saving obtained by using the technique and second the number of auxiliary capacitors that maximize this saving.

The auxiliary switches showed in Fig. 2.2 must be sequentially turned on one at a time, in order to charge and discharge the main capacitance. Each of these processes (charge or discharge) will have $N+1$ phases where $N$ is the number of auxiliary capacitors. In steady state, during the charge of the capacitance, first $S W_{1}$ is turned on and the charge is transferred from the first auxiliary capacitor to the main capacitance. The same procedure is done for each of the $N$ auxiliary capacitors and finally, the main capacitance is fully charged by turning on $S W_{N+1}$ which connects the main capacitance with the voltage source. An analog process is done to discharge the capacitance, starting with $S W_{N}$ and finally completing the discharge through $S W_{0}$. In Fig. 2.3, the main capacitance voltage is illustrated as a function of time, when using stepwise charging with $N=3$. In each phase, it is shown which auxiliary switch is turned on.

The total energy consumption in the charge-discharge of a capacitance using stepwise charging can be divided into three sources. First, the energy that is taken from the power supply to finish the charge of the capacitance through $S W_{N+1}$ $\left(E_{D}\right)$. Second, the energy necessary to turn on the auxiliary switches $\left(E_{S W}\right)$. Last, the energy used to generate the signals that turn on, sequentially, the auxiliary switches in order to charge-discharge the capacitance $\left(E_{L}\right)$. The latter two increase with the number of auxiliary capacitors while $E_{D}$ decreases. Consequently, there will be an optimum number of auxiliary capacitors that minimize the total energy


Figure 2.3: Charge and discharge of the main capacitance using stepwise charging. $N=3$
consumption for a given main capacitance and switching frequency values.

### 2.2.1 Energy consumption

Consider the aim is to charge-discharge a capacitance $C$ periodically at a frequency $f$. From the period $T=1 / f$ a minimum time available for the charge-discharge process must be defined ( $T_{C D}$ ), which corresponds to the sum of the minimum time available for the charge ( $T_{C}$ ) and discharge $\left(T_{D}\right)$, see Fig. 2.3. The relationship between $T_{C D}$ and $T$ depends on the application, but once this is defined both times are determined.

The procedure to obtain the energies involved is described next. A first approach to the optimum number of auxiliary capacitors will be made considering only $E_{D}$ and $E_{S W}$ in a complete charge-discharge cycle. $E_{L}$ will be discussed further on.

In steady state, the voltage in each auxiliary capacitor from Fig $2.2\left(C_{\text {auxi }}\right)$ is ( [51, 71, 72], see Appendix A)

$$
\begin{equation*}
V_{i}=i \frac{V_{d d}}{N+1} . \tag{2.3}
\end{equation*}
$$

This is why, when $C$ is connected to $V_{d d}$ the initial voltage of this ( $V_{0}$ in Fig. 2.2) is equal to $V_{N}=N V_{d d} /(N+1)$ since the $C_{\text {auxi }}$ are designed much bigger than $C$. In consequence, the energy spent to complete the charge of $C$ through $S W_{N+1}$ is

## Chapter 2. DC-DC Converters

$$
\begin{equation*}
E_{D}=\frac{C V_{d d}^{2}}{N+1} . \tag{2.4}
\end{equation*}
$$

With the value of the main capacitance $C$ and the number of auxiliary tank capacitors, we can obtain $E_{D}$.

The energy spent to turn on the auxiliary switches is

$$
\begin{equation*}
E_{S W}=V_{d d}{ }^{2} C_{S W}, \tag{2.5}
\end{equation*}
$$

where $C_{S W}$ is the effective gate capacitance of the auxiliary switches driven in a complete charge-discharge cycle. To calculate $C_{S W}$, the width of the switches must be determined to fulfill the timing restrictions imposed by the target $T_{C D}$.

Every time a switch is turned on, it forms a Caux-R-C circuit. Considering $C_{a u x_{i}} \gg C$, this circuit can be modeled as an R-C circuit with its respective time constant $\tau$. We considered the charge and discharge times were both equal to $T_{C D} / 2$. In this case, each RC circuit must transfer the charge between the main capacitance and the auxiliary capacitor in a $T_{A u x}$ time given by

$$
\begin{equation*}
T_{A u x}=\frac{T_{C D}}{2(N+1)} . \tag{2.6}
\end{equation*}
$$

The switches must be designed so that $T_{A u x}>m \tau$, where $m$ is the number of time constants that assures that the charge has been sufficiently transferred, usually between 2 and 4 [51]. To fulfill this restriction, the resistance of the switch must be

$$
\begin{equation*}
R<\frac{T_{C D}}{2 m C(N+1)} . \tag{2.7}
\end{equation*}
$$

To design the switches, the relationship between the width of the transistors and the resistance of the switches must be known. This can be modeled as

$$
\begin{equation*}
W=\frac{K(V)}{R}, \tag{2.8}
\end{equation*}
$$

where $K(V)$ depends on the technology used and the voltage applied to the switch. This parameter can be obtained from electrical simulations. Using Eq. (2.7) and Eq. (2.8) the width of the switches can be calculated as

$$
\begin{equation*}
W_{S W_{i}}=\frac{2 m(N+1) C K_{i}}{T_{C D}} . \tag{2.9}
\end{equation*}
$$

For each number of auxiliary capacitors, as $C$ and $T_{C D}$ are specifications determined by the application, the width of the switches can be obtained. Knowing the width of the switch, its gate capacitance can be determined. The effective capacitance driven in a complete charge-discharge cycle is

$$
\begin{equation*}
C_{S W}=C_{S W_{0}}+C_{S W_{N+1}}+2 \sum_{i=1}^{N} C_{S W_{i}} \tag{2.10}
\end{equation*}
$$

where it is considered that the switches that connect to $V_{d d}$ and $g n d$ are operated only once per charge-discharge cycle and the remainder of the auxiliary switches are operated twice.

Finally, the total energy consumption can be calculated as

$$
\begin{equation*}
E_{T o t}\left(C, T_{C D}, N\right)=E_{D}(C, N)+E_{S W}\left(C, T_{C D}, N\right) \tag{2.11}
\end{equation*}
$$

To summarize, knowing the value of $C$ and $T_{C D}$, the total consumption can be calculated for each number of storage capacitors and the optimum number required to minimize the total energy consumption can be obtained.

To support the analysis presented, simulations were performed using a 130 nm technology, $C=178 \mathrm{fF}$ and $T_{C D}=10 \mathrm{~ns}$. In Fig. 2.4, the energy consumption $E_{S W}$ and $E_{D}$ estimated is shown as a function of the number of auxiliary capacitors as well as the simulation results obtained for this example. As expected $E_{D}$ decreases with the number of auxiliary capacitors, while $E_{S W}$ increases showing the existence of the optimum number of auxiliary capacitors. Finally, the total consumption can be calculated and it can be seen that the optimum number of auxiliary capacitors is 4 while obtaining an energy saving of $61 \%$.


Figure 2.4: Energy consumption vs number of auxiliary capacitors $(N)$. Comparison estimated (lines) vs simulated (dots). $C=178 \mathrm{fF}, T_{C D}=10 \mathrm{~ns}, C_{a u x_{i}}=10 C$ and, for example, in the case of $\mathrm{N}=4$, the size of the auxiliary switches (in Fig. 2.2) are $W_{s w 0}=400 \mathrm{~nm}, W_{s w 1}=$ $500 \mathrm{~nm}, W_{s w 2}=1.6 \mu \mathrm{~m}, W_{s w 3}=2.9 \mu \mathrm{~m}, W_{s w 4}=1.5 \mu \mathrm{~m}, W_{s w 5}=1.2 \mu \mathrm{~m}, L=120 \mathrm{~nm}$

## Chapter 2. DC-DC Converters

### 2.2.2 Logic

The third source of energy consumption involved in the stepwise charging technique is the logic generating the pulses that sequentially drive the auxiliary switches $\left(E_{L}\right)$. In this case, an example of how to design this logic is presented. When a capacitance is charged and discharged in a conventional implementation, an enable signal EN is used that charges the capacitance when it is in a high logic value and discharges it when it is in a low logic value usually through an inverter or chain of inverters.

While implementing the stepwise charging technique a logic block must be able to generate, from signal EN, the auxiliary signals (EN0, EN1, etc.) that drive the auxiliary switches. With a rising edge of signal EN, the block must start the charge of the capacitance. During this time, $S W_{1}$ to $S W_{N}$ must be turned on and off and finally maintain $S W_{N+1}$ turned on. On the other hand, with a falling edge of this signal, the discharge process will take place. An example of these signals is shown in Fig. 2.5 for $\mathrm{N}=2$.

To generate the auxiliary signals, a non-overlapping pulse generator is needed [10, 73]. Fig. 2.6 shows the architecture used to generate a pulse with a specific width. An edge in signal $I N$ will produce a pulse in signal $P U L S E$. The width of this pulse will be determined by the delay of the inverters between the two inputs of the XOR gate. Non-overlapping consecutive pulses can be obtained by connecting a number of these blocks in cascade. With some simple additional logic the auxiliary signals EN0, EN1, etc. can be obtained from the output of these blocks.

Next, an example of the logic block using two auxiliary capacitors is presented. Fig. 2.5 shows the auxiliary signals EN0, EN1, EN2 and EN3 that this block must generate. The width of pulses EN1 and EN2 will define the time devoted to doing the transfer between the auxiliary capacitors and the main capacitance. This width must be equal to $T_{\text {aux }}$ Eq. (2.6) which depends on the $T_{C D}$ and the number of auxiliary capacitors, in this case $N=2$. EN0 and EN3 must be maintained until the next edge of signal EN.

To obtain these pulses an architecture using the block presented in Fig. 2.6 was implemented and it is shown in Fig. 2.7. From the auxiliary signals p1, p2 and p3, the signals in Fig. 2.5 can be obtained doing the operations show below.

$$
\begin{gathered}
E N 0=!(p 3+E N) \\
E N 1=E N \cdot p 1+!E N \cdot p 2 \\
E N 2=E N \cdot p 2+!E N \cdot p 1 \\
E N 3=p 3 \cdot E N
\end{gathered}
$$

Although this architecture might be affected by mismatch and process variations, the exact width of the pulses is not critical since they just need to be long


Figure 2.5: Control pulses needed to implement the technique. Example $N=2$.
enough to allow the charge transfer between the auxiliary capacitors and the main capacitance.

Finally, the total energy consumption can be calculated as

$$
\begin{equation*}
E_{T o t}\left(C, T_{C D}, N\right)=E_{D}(C, N)+E_{S W}\left(C, T_{C D}, N\right)+E_{L}\left(T_{C D}, N\right) \tag{2.12}
\end{equation*}
$$

To summarize, knowing the value of $C$ and the target $T_{C D}$, the total consumption can be calculated for each number of storage capacitors and the optimum

## Chapter 2. DC-DC Converters

number required to minimize the total energy consumption can be obtained. In order to compare stepwise charging with the classic implementation $(N=0)$, the energy saving obtained can be defined as

$$
\begin{equation*}
\text { Saving }=1-\frac{E_{T o t}\left(C, T_{C D}, N=N_{O p t}\right)}{E_{T o t}\left(C, T_{C D}, N=0\right)} . \tag{2.13}
\end{equation*}
$$

In the next section, the saving obtained by using the technique will be analyzed and the theoretical limits due to the consumption of this block will be discussed.


Figure 2.6: Architecture selected to generate a pulse.


Figure 2.7: Logic block using the pulse generator of Fig. 2.6. Example $N=2$.

### 2.2.3 Limits of the stepwise charging technique

To obtain some insight into the limits of using stepwise charging, the savings for different main capacitances $C$ and charge/discharge time $T_{C D}$ were calculated based on the considerations mentioned above. This means that $C_{a u x_{i}}=10 C$ and the width of the auxiliary switches were designed with Eq. (2.9). In the particular case of the $E_{L}$, post-layout simulations were carried out for one case of $T_{C D}$ and N and extrapolated to obtain an estimation of this consumption for different $T_{C D}$ and N. This was necessary since the consumption of the logic is strongly dependent with the layout and parasitic capacitances introduced. A 130 nm process was used with a $V_{d d}=1.2 \mathrm{~V}$.

## Capacitance dependence

First, the dependence on the main capacitance is analyzed. The energy saving as a function of the capacitance for two cases, with or without considering $E_{L}$ is shown

### 2.2. Analysis of stepwise charging

in Fig. 2.8a with a constant $T_{C D}=10 \mathrm{~ns}$. For each capacitance, the optimum number of auxiliary capacitors was used and shown in Fig. 2.8b, which goes from 0 (when the savings are 0 ) to 4 depending on the capacitance selected.


Figure 2.8: Capacitance dependence. $T_{C D}=10 \mathrm{~ns}$

Eq. (2.9) shows that as the value of the capacitance decreases, the width of the switches and $E_{S W}$ also decreases. This holds true until the minimum width allowed by the technology is reached. When the auxiliary switches reach the minimum

## Chapter 2. DC-DC Converters

width, $E_{S W}$ becomes constant. However, Eq. (2.4) shows that $E_{D}$ is reduced when a smaller $C$ is selected. When $E_{S W}$ becomes constant and $E_{D}$ continues decreasing, $E_{S W}$ becomes dominant, and the saving obtained starts decreasing, as it can be seen in Fig. 2.8a.

If the capacitance value is high enough so that all the auxiliary switches are larger than the minimum allowed by the technology, and as both consumptions $E_{S W}$ and $E_{D}$ are in first approximation, proportional to $C$, the optimum number of auxiliary capacitors and the savings obtained become independent of the selected $C$. Fig. 2.8a reveals this observation.

In previous works, the energy consumption of the logic block was not considered since when the capacitance is big enough this consumption is negligible. This can be seen in Fig. 2.8a, in this technology and with the designed logic, for capacitance larger than 100 pF the dotted and solid lines achieve the same savings. However, there are several applications that still can benefit from stepwise charging ( $500 \mathrm{fF}<C<100 \mathrm{pF})$. But in these cases, the $E_{L}$ is the one that determines the total savings. In this technology and with the designed logic, this minimum main capacitance that can benefit from this charge recycling technique is around 500 fF as shown in Fig. 2.8a.

## $T_{C D}$ dependence

Now, the dependence on the $T_{C D}$ is analyzed. The energy saving as a function of the $T_{C D}$ for the two cases, with or without considering $E_{L}$ is shown in Fig. 2.9 a with a constant $C=1.4 p F$. For each $T_{C D}$, the optimum number of auxiliary capacitors was used and is shown in Fig. 2.9b.

Eq. (2.9) shows that the width of the switches depends on the required $T_{C D}$. For low $T_{C D}$, the auxiliary switches have a larger width (in order to reduce their resistance) and bigger gate capacitance, which increases $E_{S W}$. Since $E_{D}$ does not depend on the $T_{C D}$, there is a minimum $T_{C D}$ that achieves energy savings. In this technology and with the selected capacitance, 200ps is this minimum $T_{C D}$.

As it was mentioned in Section 2.2.2, to change the $T_{C D}$ in the simulations, we needed to control the delay of the inverters in Fig. 2.6. In this case, we changed the delay by changing the length of the transistors in these inverters. This was used only to be able to simulate the stepwise charging technique with different $T_{C D}$, but it should be noticed that depending on the selected $T_{C D}$ this might not be the best option. In the end, the logic block should be optimized for the specific $T_{C D}$ determined by the application.

When the $T_{C D}$ is less exigent ( $T_{C D}$ higher) the auxiliary switches can be designed smaller and thus reduce the $E_{S W}$. Since $E_{D}$ does not depend on the $T_{C D}$, the savings increase as $T_{C D}$ increases and also the number of auxiliary capacitors. This is true until the minimum auxiliary switches allowed by the technology are reached. After this, the savings obtained by stepwise charging stop depending on

### 2.2. Analysis of stepwise charging



Figure 2.9: Capacitance dependence. $C=1.4 p F$
$T_{C D}$.
The main concern is that more auxiliary capacitors imply more auxiliary switches and thus more pulses need to be generated by the logic block which increases its power consumption. If the logic block is considered, the optimum number of auxiliary capacitors and the saving is limited to lower values, than the case where the logic is not considered. Furthermore, as the width of the pulses is

## Chapter 2. DC-DC Converters

related to the length of the transistors that implement the inverters, the energy needed to obtain a pulse increases with its width. Consequently, the saving decreases while trying to achieve lower frequency. Improvements via other techniques to control the width of the pulse have been previously showed in $[10,73]$.

## $T_{C D}$ and $C$ dependence

To summarize, the results of the savings as a function of $C$ and $T_{C D}$ are shown in Fig. 2.10 for a 130 nm CMOS process. For the values of $C$ and $T_{C D}$ in Fig. 2.10 the optimum number of auxiliary capacitors goes from 1 to 4 depending the case. As it was mentioned in the previous analysis, for higher $C$ and $T_{C D}$ the savings increase. However as it will be seen in the next section, SC dc-dc converters can benefit from the application of this technique since the range of gate capacitances and the time available for the charge/discharge are in the ranges presented in Fig. 2.10.


Figure 2.10: Saving vs capacitance that is charged and discharged $(C)$ vs charge and discharge time $T_{C D}$.

In conclusion, by using the stepwise charging technique the energy consumption due to a periodic charge-discharge of a capacitance $C$ with a frequency $f$ can be reduced. The energy saving strongly depends on the value of the capacitance and the $T_{C D}$. The consumption due to the generation of the auxiliary pulses required to implement the technique ( $E_{L}$ ) is vital. Depending on the selected $C$ and $T_{C D}$, $E_{L}$ can be dominant and determine the saving obtained.

### 2.3 Stepwise charging in SC dc-dc converters

A SC dc-dc step-down converter was manufactured in a 130 nm process and measurements were done to validate using stepwise charging in this kind of converters. A photo of the manufactured IC is shown in Fig. 2.11a.

The selected architecture for the converter is shown in Fig. 2.11b (same converter presented in Fig. 2.1), that implements a step down series-parallel dc-dc converter, that divides the input voltage by three. These converters have two phases. In the first phase ( $\phi_{1}$ ), switches $\operatorname{Sw} \phi_{1}$ are closed, while the remainder are opened and charge is taken from the voltage source in order to charge capacitors C1, C2 and C3. In this phase, CL delivers the charge to the output load. In the second phase $\left(\phi_{2}\right)$, only switches $\mathrm{Sw} \phi_{2}$ are closed. During this phase, C1, C2 and C3 provide the charge to the load and CL.


Figure 2.11: Prototype SC dc-dc converter

Table 2.1 shows the sizing of the commponents of the dc-dc converter. The transfer capacitors are $C_{1}, C_{2}$ and $C_{3}$ while the output capacitance is $C_{L}$ in Fig. 2.11b. All the power switches of the converter were designed of the same size and it is shown in the table.

By using stepwise charging, the energy spent to turn on and off the switches of the converter can be reduced and thus the efficiency of the converter increased (Eq. 2.1).Each power switch of the designed converter was implemented with a stepwise driver as the one shown in Fig. 2.2 and analysed in Section 2.2. The capacitance

## Chapter 2. DC-DC Converters

Table 2.1: dc-dc Converter Sizing

| Transfer Capacitors | 288 pF |
| :--- | :--- |
| Output Capacitor | 744 pF |
| Power switches | $\mathrm{W} / \mathrm{L}=250 \mu \mathrm{~m} / 120 \mathrm{~nm}$ |

that is charged and discharged, $C$ (Fig. 2.2), corresponds with the gate capacitance of the power switches of the converter. In this case, the optimum number of auxiliary capacitors for the stepwise driver was 2 , so each driver consisted in 4 auxiliary switches. Table 2.2 shows the width of these switches. All auxiliary switches have length of 120 nm .

Table 2.2: Stepwise Driver Sizing

| Sw3 | 900 nm |
| :--- | :--- |
| Sw2 | $1.4 \mu \mathrm{~m}$ |
| Sw1 | 600 nm |
| Sw1 | 350 nm |
| $C_{\text {aux }_{i}}$ | 7 pF |

In the designed converter, the total gate parasitic capacitance that needs to be charged and discharged depends on the phase of the converter. This is because each phase has a different number of switches that are turned on and off. In first approximation we can add up all the gate capacitances of the power switches in one phase and we obtained in $\phi_{1}$ and $\phi_{2}, 0.9 \mathrm{pF}$ and 1.5 pF respectively. This allowed us to share the auxiliary capacitors and logic blocks of each stepwise driver in power switches of the same phase. Finally, we end up with two auxiliary capacitors ( $C_{a u x_{i}}$ in Fig. 2.2) for each phase with a capacitance value of 7 pF . Additionally, the stepwise driver $T_{C D}$ was designed to be 10 times faster than the maximum frequency of operation of the converter ( $\phi_{1}$ and $\phi_{2}$ frequency) to guarantee that the switches would turn on and off on time. This is why the driver was designed with a $T_{C D}=10 \mathrm{~ns}$, since the maximum operating frequency of the converter was around 10 MHz .

To determine the benefits of using stepwise charging for gate drive energy reduction, additional logic was added to enable and disable the stepwise charging technique. Fig. 2.12 shows the measurement of the efficiency of the dc-dc converter, with and without the charge recycle technique, as a function of the output voltage of the converter. The figure shows the results for the 8 ICs that were
measured. The converter with the charge recycling technique enabled increased the overall efficiency in up to $4 \%$, from $69.5 \%$, reaching a maximum efficiency of $73.5 \%$. Additionally, Fig. 2.13 shows the measurement of the efficiency as a function of the output current of the converter.


Figure 2.12: Efficiency measurements of eight prototypes of the dc-dc converter with the recycle technique and without it for different output voltages. Load current $I_{L}=60 \mu \mathrm{~A}$

The converter was tested using two independent voltage sources. The first one was used as the input voltage of the converter and has a power consumption $\left(P_{i}\right)$. The second supplies the power to the switches of the converter, including the logic that implements the recycle technique when this is enabled. The power consumption for the second voltage source is $\left(P_{s w}\right)$. The sum of these two power consumptions is the $P_{L}+P_{\text {LOSSES }}$ of the converter used in Eq. (2.1), while $P_{L}$, the power delivered to the load, is calculated as $I_{L} \times V_{o}$, where $V_{o}$ is the output voltage of the converter and $I_{L}$ the current delivered to the load.

For each measurement in Fig. 2.12, the $P_{s w}$ with stepwise charging and without can be measured to obtain the energy reduction achieved by using the technique. The measured results are shown in Fig. 2.14 for different output voltages. Stepwise charging reduces by $29 \%$ the energy consumption in the gate drive of the switches of the dc-dc converter fabricated. These results include the consumption of a nonoverlapping clock generator necessary to obtain the two signals that enable and disable each phase of the converter.

The measurements of this study confirm that significant energy savings ( $29 \%$ in energy for driving switches) are feasible in a fully integrated implementation of stepwise charging. Furthermore, the results validate the use of this recycle

## Chapter 2. DC-DC Converters



Figure 2.13: Efficiency measurements of the dc-dc converter with the recycle technique and without it for different load currents.


Figure 2.14: Saving obtained using the recycle technique in the power consumption of the converter switches as a function of the output voltage.
technique in SC dc-dc converters, improving the overall efficiency in $4 \%$ without significant impact in terms of area or other performances.

### 2.4 Analysis of charge sharing

In this section, we present a general analysis of the Charge Sharing (CS) technique. We address the question of when the use of this technique is suitable and how much energy can be saved. The problem is specified by three parameters, the capacitance $C 1$ that needs to be charged (discharged), the capacitance $C 2$ that

### 2.4. Analysis of charge sharing

needs to be discharged (charged) at the same time and the time available for the charge/discharge ( $T_{C D}$ ) process.

To compare the classic driving circuit with a driving circuit that implements the CS technique, we can calculate the energy spent in a complete cycle where both capacitors are charged and discharged for both circuits.

In a classic implementation, the energy spent is first due to the charging of the capacitors itself and can be calculated with Eq. 2.14.

$$
\begin{equation*}
E_{C 1}=C 1 \times V_{d d}^{2} \text { and } E_{C 2}=C 2 \times V_{d d}^{2} \tag{2.14}
\end{equation*}
$$

The second source of energy consumption is due to the driving of switches that charge (discharge) the capacitors $\left(E_{S W}\right)$. First, the resistance of each switch is calculated to fulfill the timing restrictions imposed by the target $T_{C D}$. Then through electrical simulations the width of the switches $\left(W_{S W}\right)$ to obtain that resistance and its gate capacitance $\left(C_{S W}\right)$ can be obtained and therefore $E_{S W}$. This process is similar to the one presented in Section 2.2.1.


Figure 2.15: Charge sharing technique.
Figure 2.15 shows the basic idea of the CS technique. When this technique is applied, the time $T_{C D}$ is divided into two phases, the first one where the two capacitors are connected together through the exchange switch ( $S W_{E x}$ ) and a second phase where each capacitor is charged/discharged through a classic driver $\left(S W_{C C 1}, S W_{D C 1}, S W_{C C 2}, S W_{D C 2}\right)$. Figure 2.16 shows the signals involved in both cases, with and without the CS technique.

The energy spent to charge the capacitor is reduced since after the exchange phase each capacitor already has energy stored in it. The energy spent to finish charging each capacitor can be calculated with Eq. 2.15.

$$
\begin{equation*}
E_{C 1_{C S}}=\frac{C 1^{2}}{C 1+C 2} \times V_{d d}^{2} \quad E_{C 2_{C S}}=\frac{C 2^{2}}{C 1+C 2} \times V_{d d}^{2} \tag{2.15}
\end{equation*}
$$

Additionally, while using CS the $E_{S W}$ increases since an additional switch $\left(S W_{E x}\right)$ is needed. The procedure to obtain this energy is the same as in the classic driver.

Chapter 2. DC-DC Converters


Figure 2.16: Charge sharing technique example signals. An edge in VCD indicates that the capacitors have to start charging/discharging. VC1 CS and VC2 CS are the voltage in the two capacitors whereas using the CS technique and VC1 and VC2 are without using the technique. The signals showed in the second graph correspond to the nodes with the same name in Fig. 2.15 .

Finally, the technique needs a logic block to generate the signals showed in the second graph of Fig. 2.16 (VE, VF1, VF2) from the charge/discharge signal (VCD). To generate these signals, a non-overlapping pulse generator is needed. The idea is the same as it was presented in Section 2.2.2. In this case, current starved inverters are used to control the delay. The energy spent by this block will be referred to as $E_{\text {Log }}$.

To have a first theoretical limit for the savings obtained while using the CS technique, we can consider as if $E_{S W}$ and $E_{\text {Log }}$ are negligible. If this is the case, the savings are determined by Eq. 2.16.

$$
\begin{equation*}
1-\frac{E_{C 1_{C S}}+E_{C 2_{C S}}}{E_{C 1}+E_{C 2}}=\frac{2 \gamma}{(1+\gamma)^{2}} \quad \text { where } \gamma=\frac{C 2}{C 1} \tag{2.16}
\end{equation*}
$$

From Eq. 2.16 it can be seen that the maximum savings are limited to $50 \%$ and the two capacitances need to have the same value. As the value of the capacitances differs, the savings start to drop. For example, when $\gamma=2$ the maximum savings drop to $44 \%$.

However, $E_{S W}$ and $E_{L o g}$ are not always negligible. $E_{S W}$ was obtained the same way as in the classic implementation where the resistance and capacitance as a function of the width of the switches were extracted from simulation for a 130 nm technology. The energy spent by the logic block $\left(E_{\text {Log }}\right)$ was extracted from electrical simulations (post layout simulations) for different pulse widths. The sum of all the energy spent during a complete cycle of charge-discharge will be referred as $E_{\text {Tot }}$ and $E_{\text {Tot }}^{C S}$ for the case of a classic driving circuit and using charge sharing, respectively. Finally, the energy savings were calculated ( $1-E_{T o t_{C S}} / E_{T o t}$ ) for different capacitances values $(\mathrm{C} 1=\mathrm{C} 2=\mathrm{C})$ and $T_{C D}$.

Figure 2.17 shows the savings obtained through this analysis. From the figure, we can see that for small capacitances the savings start to drop. This is due to two reasons. Firstly, because the $E_{\text {Log }}$ starts having a higher impact on the overall energy consumption of the driver. While $E_{S W}$ and $E_{C_{i}}\left(E_{C 1}, E_{C 2}, E_{C 1}\right.$, or $\left.E_{C 2_{C S}}\right)$ scale down with C, $E_{\text {Log }}$ remains the same. Secondly, because the minimum switch width is reached and $E_{S W}$ also stops scaling with C which increases its impact in the total energy consumption. These two reasons are the same as in stepwise charging and it is why while considering this range of capacitances (between 200 fF to 1000 fF ), the energy consumption of the logic block is crucial. In this figure, we show the results of one logic block but is important to take special care in the design of this block to maximize the savings.

On the other hand, when the capacitances value increase, $E_{C_{i}}$ dominate and the savings tend to the theoretical limit. As can be seen, this is true when $T_{C D}$ is not very demanding and the $E_{S W}$ is negligible (e.g. $T_{C D}=30 n s$ ).

When $T_{C D}$ is very demanding (e.g. $T_{C D}=0.5 n s$ ), the $E_{S W}$ starts to be comparable with $E_{C_{i}}$ which has an impact in the total energy and that is why the energy savings are less than for higher $T_{C D}$. This is true even if the $E_{S W}$ was the same for both cases, with and without CS. However, $E_{S W}$ is higher when the CS technique is implemented mainly because it uses more switches.

In conclusion, in this 130 nm process, significant savings can be obtained with capacitances higher than 300 fF . In the next section, we propose to take advantage of this technique to reduce the gate-drive losses in SC dc-dc converters.

## Chapter 2. DC-DC Converters



Figure 2.17: Savings $\%$ vs $\mathrm{C} 1=\mathrm{C} 2=\mathrm{C}$ for different $T_{C D}$.

### 2.5 Charge sharing in SC dc-dc converters

As it was stated previously, gate-drive losses can be very significant in SC dc-dc converters and reducing them has a direct impact on the efficiency of the converter. These converters generally operate in two phases (Phi1 and Phi2). Usually, one in which charge is taken from the power source and stored in the capacitors and a second phase where this charge is delivered to the load. Each phase is implemented using MOS switches which depending on the voltage applied to them, are either an nMOS or a pMOS switch.

When both nMOS and pMOS switches are used in the same phase, we have two capacitances where one needs to be charged at the same time that the other needs to be discharged. As it was shown in the previous section this is the necessary condition to use the CS technique.

In this section we present measurements results for a SC dc-dc converter designed in a 130 nm technology. The architecture selected is a divide-by-three doubler topology, as shown in Fig. 2.18 [44]. It can be seen that, in this topology, when phase Phi1 is finished the gate capacitance of switch SW1 needs to be charged while the gates capacitances of SW2, SW5 and SW7 need to be discharged and the opposite happens when the phase starts. On the other hand, when phase Phi2 is finished the gate capacitance of switch SW4 needs to be charged while the gates capacitances of SW3 and SW6 need to be discharged and the opposite happens when the phase starts.

The converter was designed to deliver an output power of 400 uW while dividing by three the input voltage $(1.2 \mathrm{~V})$ and using a total capacitance of 0.5 nF .


Figure 2.18: SC dc-dc converter architecture.

The design was obtained following the methodology proposed by [44] and selecting a target output voltage of 380 mV and a ratio $C_{\text {top }}+C_{\text {bottom }} / C=0.02$ where $C$ is the total capacitance and $C_{t o p}, C_{b o t t o m}$ the parasitic top and bottom plate capacitances respectively. The switches widths and their equivalent gate capacitances are shown in Table 2.3. All switches use minimum length transistors.

Table 2.3: SC dc-dc converter design

| Switch | SW1 | SW2 | SW3 | SW4 | SW5 | SW6 | SW7 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| WSW $(\mu \mathrm{m})$ | 227 | 179 | 120 | 334 | 179 | 179 | 120 |
| CSW $(\mathrm{fF})$ | 243 | 190 | 128 | 357 | 190 | 190 | 128 |

In phase Phi1, the two capacitances that need to be charged and discharged are $243 f F$ (SW1)(Vphi1p) and $508 f F$ (SW2, SW5, SW7)(Vphi1n) while in phase Phi2 they are 357 fF (SW4)(Vphi2p) and 510 fF (SW3 and SW6)(Vphi2n). Additionally, the maximum frequency of operation is 29 MHz . As a consequence, the switches must be turned on at least 10 times faster than this frequency which corresponds to a $T_{C D}$ of 3.4 ns .

If the CS technique is used in the driving of the switches in each phase, the analysis made in the previous section predicts a saving equal to $25 \%$ and $29 \%$ for phase Phi1 and Phi2 respectively in comparison with a classic driving circuit.

## Chapter 2. DC-DC Converters

The designed converter was fabricated and some preliminary measurements are presented in this section. The converter was measured using a classic driver circuit and a driver using the CS technique. By using the proposed driver, a 25 $\%$ energy saving was obtained in the gate drive losses of the converter, Fig. 2.20 shows this power consumption as a function of the output voltage in both cases, with CS (wCS) and without (woCS). Depending on the desired output voltage, the relationship between the gate drive losses and the total losses changes and thus how much does the efficiency improve. The efficiency measured as a function of the output voltage is presented in Fig 2.20 for a load current of $300 u A$. For example when the output voltage is 370 mV the efficiency improves from $79.7 \%$ to 82.6 \% while using the CS technique.


Figure 2.19: Gate drive power consumption vs output voltage. Without CS (dashed line, woCS) and with CS (solid line, wCS).

### 2.6 Conclusions

In this chapter, we analyzed two charge recycling techniques that reduce the energy consumption of capacitances that need to be charged and discharged periodically.

The first one, stepwise charging is a well known technique based on adiabatic charging principles. As far as we know this is the first time that stepwise charging in fully integrated implementation is analyzed taking into account the energy due to auxiliary circuits. The limits and practical considerations that need to be taken into account to implement the stepwise charging with small capacitances were discussed. In particular, it was shown that the consumption of the auxiliary logic


Figure 2.20: Efficiency vs output voltage. Without CS (dashed line, woCS) and with CS (solid line, wCS). $I=300 \mu A$.
is critical in these cases and special care needs to be taken into account while designing this block.

The advantage of the logic implemented is that there is no need for an auxiliary fast clock signal, hence consumption is reduced and the implementation of the technique is virtually invisible to the user. The same signal used in the classical charge-discharge process of a capacitance generates all the necessary signals to implement the stepwise charging technique.

The second technique, Charge Sharing (CS) technique is based on the fact of having two capacitors, one that needs to be charged and another one that needs to be discharged at the same time. We present a theoretical analysis to know when it is suitable to use this technique. In this case, it was shown that the consumption of auxiliary logic is also crucial while considering small capacitors.

Based on these techniques we proposed two driving circuits for the switches of SC dc-dc converters that reduce the gate-drive losses. We designed and measured two converters, one that implements a stepwise driver and the second one a charge sharing driver. It was demonstrated that more than $25 \%$ energy reductions can be obtained with these techniques in the driving of the switches that implement the converter. This improves the efficiency of the converter in up to $4 \%$ without any performance nor significant area penalties.

This page was intentionally left blank.

## Chapter 3

## Subthreshold digital circuits modelling

In this chapter, we analyze two issues regarding subthreshold digital circuits. First, we developed a methodology to asses the minimum operating voltage of digital circuits. Here, we took into account the intrinsic noise introduced by the device and defined the minimum operating voltage as the one that achieved a certain failure rate due to bit flips caused by intrinsic noise. In Section 3.1.1, the model and method for noise characterization is presented. In Section 3.1.2, we used this methodology to obtain the minimum operating voltage for different technology nodes and compare it with previously proposed definitions.

Afterward, in Section 3.2.1 we study the classic modeling of subthreshold digital circuits and the characterization of the Minimum Energy Point (MEP). Then, in Section 3.2.2, we improve this model by taking into account the difference between NMOS and PMOS devices and showe that there is an optimum imbalance between the leakage currents of the devices that minimizes the energy consumption per operation of subthreshold digital circuits operating near the MEP.

### 3.1 Minimum operating voltage in subthreshold digital logic

In the past two decades, important efforts have been established to attain a lower boundary for digital CMOS technology scaling. Several works have pointed out that a physical limit would arise due to the false bit flip generated by intrinsic noise sources such as thermal, shot and flicker noise [74-77].

In [74] Stein derived a relationship between the minimum energy per logical operation consumed by an ideal CMOS inverter and the bit error rate due to random fluctuations introduced by thermal noise which obeys a Gaussian distribution. Moreover, he compared these minimum energies with fundamental limits of CMOS presented in [78] by Swanson et al. and stated that the minimum energy per logical operation is limited by intrinsic noise when considering acceptable bit

## Chapter 3. Subthreshold digital circuits modelling

error rates. Natori et al. in [75] extended this analysis by considering the effect of a subsequent gate using as case study a chain of idealized inverters. This approach limited the noise frequency range that could propagate through the next logic gate. Under these conditions, the authors suggested that the scaling limit lied around $10-20 \mathrm{~nm}$. Kish achieved a similar result in [76] where he pointed out that serious problems would emerge around 25 nm due to thermal noise.

The key problem with these approaches is that they all use an idealized model for the CMOS inverter consisting of a constant drain to source resistance and a constant output capacitance, therefore considering the noise Power Spectral Density (PSD) of a resistor. On the contrary, in [77] Kleeberger et al. used BSIM4 transistor model [79] and Predictive Technology Models [80] files to account for the noise PSD at the output of a CMOS inverter down to the 16 nm node. The authors conclude that previous works overestimate noise Root-Mean-Square (RMS) voltage up to a factor of 4 . They conclude that intrinsic noise will not be a problem until at least 8 nm .

These prior works considered operation at nominal, above the threshold, supply voltage $\left(V_{d d}\right)$. In this thesis, we addressed the question of how does intrinsic noise affect sub/near threshold digital circuits. When $V_{d d}$ is reduced the noise margins decrease and variability is worsened making the added effect of variability and intrinsic noise a problem. To formally account for the intrinsic noise in sub/near threshold digital circuits, we used the BSIM4 transistor model and PTMs parameters to determine the noise PSD at the output of an inverter. Additionally, we considered the bandwidth of the subsequent gate to obtain the noise RMS voltage at the output node of an inverter. Then, we used the same approach as Kish to obtain the failure rate of the circuit as the mean frequency of crossing a given threshold voltage.

### 3.1.1 Model and method

## Characterizing noise RMS voltage

In this section, we present the model and methodology used to determine the noise RMS voltage at the output of an inverter. Previous works [74, 76] used a simplified model for the MOS transistor consisting of a drain to source resistance and an output capacitance. Using this model and considering the noise PSD of a resistance, the output noise RMS voltage can be determined by

$$
\begin{equation*}
V_{N}=\sqrt{\frac{k T}{C}} \tag{3.1}
\end{equation*}
$$

where $k$ is the Boltzmann's constant, $T$ the absolute temperature and $C$ the output capacitance of the aforementioned model. This approach does not take into consideration how would the noise propagate through a subsequent gate and thus not takes into account the bandwidth of the next gate. In [75] this effect

### 3.1. Minimum operating voltage in subthreshold digital logic

is included, considering the same simplified model for the following gate (an inverter in this case), thus limiting the noise PSD to $f=1 /\left(2 \pi R_{o} C_{o}\right)$, where $R_{o}$ is the drain to source resistance and $C_{o}$ the output capacitance of the next gate. This bandwidth also overestimates noise RMS voltage. Kleeberger et al. used a different methodology in [77]. They extracted the noise PSD at the output of an inverter from electrical simulations and integrated up to 6 THz to obtain noise RMS voltage. In this case they did not take into account the bandwidth of the subsequent gate, instead, they considered an arbitrary large bandwidth.

We used as characterization circuit an inverter and as subsequent gate an inverter of the same size. We preceded the inverter under test by a chain of inverters to guarantee real input logic levels. We determined the noise RMS voltage by integrating the noise PSD at the output of the inverter considering a first order filtering characteristic for the subsequent gate. To do this, first, we extracted from electrical simulations the noise PSD at the output of the inverter $(N(f))$ considering as loading circuit an inverter of the same size. Then we simulated the bandwidth $\left(f_{3 d b}\right)$ of the subsequent inverter. Finally, as the noise PSD is dominated by white noise, we calculated the noise RMS voltage as

$$
\begin{equation*}
V_{N}=\sqrt{\int_{0}^{\infty} N(f)\left|\frac{1}{\left(f / f_{3 d b}\right) j+1}\right|^{2} d f} \approx \sqrt{N_{0} f_{3 d b} \frac{\pi}{2}} \tag{3.2}
\end{equation*}
$$

where $\mathrm{N}(\mathrm{f})$ is approximated by a constant $N_{0}$ noise PSD.

## Characterizing circuit failure rate

After we characterized the noise RMS voltage, we evaluated the circuit failure rate. In $[74,75]$ the circuit failure rate was determined as the probability that the noise exceeds a given threshold voltage ( $V_{T H}{ }^{1}$ ) which can be calculated using

$$
\begin{equation*}
P_{e r r}=\frac{1}{2} \operatorname{ercf}\left(\frac{V_{T H}}{\sqrt{2} V_{N}}\right) \tag{3.3}
\end{equation*}
$$

where $V_{N}$ is the noise RMS voltage. On the other hand, [76] characterizes the failure rate of the circuit by the mean frequency of crossing a given threshold voltage $\left(V_{T H}\right)$. If the noise process corresponds with a band-limited white noise, this mean frequency can be calculated as

$$
\begin{equation*}
f_{\text {mean }}=\frac{2}{\sqrt{3}} \exp \left(\frac{-V_{T H}^{2}}{2 V_{N}{ }^{2}}\right) f_{c} \tag{3.4}
\end{equation*}
$$

where $f_{c}$ is the cut-off frequency used during the integration of the noise PSD.

[^0]
## Chapter 3. Subthreshold digital circuits modelling

In both approaches, the failure rate of the circuit is mainly determined by the relationship between the given threshold voltage ( $V_{T H}$ ) and the noise RMS voltage $\left(V_{N}\right)$. This can be seen in Fig. 3.1 where the relationship between the failure rate and the relationship $S=\frac{V_{T H}}{V_{N}}$ using Eq. (3.4) is depicted for three different $f_{c}$. The failure rate is expressed in FIT. One FIT is equal to one failure in $10^{9}$ operating hours.


Figure 3.1: Failure rate vs relationship $S=V_{T H} / V_{N}$ using Eq. (3.4).
An acceptable failure rate depends on the application for which the circuit is intended to be used and the size of this, but from Fig. 3.1 it can be seen that large changes in $f_{c}$ or acceptable failure rate (as the circuit size changes) change only slightly the required S . Even when operating in deep subthreshold, where $f_{c}$ is in the order of the MHz , the required S ratio does not change drastically. We will see later that small changes in the ratio S do not affect the final conclusions. Consequently, we will use the criterion that an acceptable $S$ is

$$
\begin{equation*}
S=\frac{V_{T H}}{V_{N}}>9 . \tag{3.5}
\end{equation*}
$$

The threshold voltage that can cause a bit flip is the static noise margin of the gate under analysis. Lohstroh et al. in [81] consider two cases for the static noise margin of the CMOS inverter. In the best-case scenario, the noise is considered to be concentrated in one point of a long chain of inverters. Under this assumption the static noise margins and thus the threshold voltages that can cause a bit flip are

$$
\begin{equation*}
V_{T H}=V_{O H}-V_{m} \tag{3.6}
\end{equation*}
$$

### 3.1. Minimum operating voltage in subthreshold digital logic

when the output logic level is high and

$$
\begin{equation*}
V_{T H}=V_{m}-V_{O L} \tag{3.7}
\end{equation*}
$$

when the output logic level is low, where $V_{O H}$ and $V_{O L}$ are the output logic levels of the inverter and $V_{m}$ is the voltage where $V_{i n}=V_{\text {out }}$ in the static characteristic of an inverter. The threshold voltage is calculated like this since a lower noise voltage amplitude would be filtered by the regenerative property of the CMOS inverter [81]. This is the approach considered by [75-77] and the one we will consider as the best-case scenario. Additionally, the output logic levels must be calculated taking into consideration the effect of the subthreshold static currents of the MOS transistor which make the output logic levels to move from their ideal values $V_{d d}$ and gnd.

On the other hand, the worst-case scenario is considered when the noise is present in all inverters of the chain and, as demonstrated in [81], when the chain is infinitely long, the static noise margins are equivalent to that of a latch based on back to back inverters. Under this assumption, the worst-case static noise margin of the inverter can be calculated as the size of the maximum square that can be inscribed between its normal and mirrored voltage transfer characteristic (known as "butterfly plot"). Seevnick et al. in [82] presented a methodology to obtain the static noise margin of a gate from this definition and is the one used by $[31,33,34,83]$. We will use this definition to consider our worst-case analysis.

In summary, to determine whether the intrinsic noise is a problem or not, first, we should determine from electrical simulations the noise PSD, the noise margins and the bandwidth of the subsequent gate. Then, calculate the noise RMS voltage considering the bandwidth of the subsequent gate and finally compare it with the threshold voltage $V_{T H}$ to evaluate the failure rate of the circuit. Additionally, variability effects should be considered in the calculations.

### 3.1.2 Simulation results

To model the CMOS inverter, we used the BSIM4 transistor model. We took the input parameters for this from the PTM model files for the $90 \mathrm{~nm}, 45 \mathrm{~nm}, 32 \mathrm{~nm}$, 22 nm and 16 nm nodes. Additionally, we used industrial process design kits for the $28 \mathrm{~nm}, 65 \mathrm{~nm}$ and 130 nm technology node. In the case of 28 nm , it is a Fully Depleted Silicon on Insulator (FD-SOI) technology.

We sized the inverters so that $V_{m}$ was close to $V_{d d} / 2$ under nominal $V_{d d}$ and thus increasing the static noise margins. In the cases where an industrial process design kit was used, we selected the minimum allowed width for the NMOS transistor. Table 3.1 shows the selected sizes for the inverters of each technology node.

Chapter 3. Subthreshold digital circuits modelling
Table 3.1: Transistors size

| Node | $L_{n}=L_{p}$ | $W_{n}$ | $W_{p} / W_{n}$ |
| :---: | :---: | :---: | :---: |
| 16 nm | 16 nm | 16 nm | 2 |
| 22 nm | 22 nm | 22 nm | 2.5 |
| 28 nm | 30 nm | 80 nm | 2 |
| 32 nm | 32 nm | 32 nm | 3 |
| 45 nm | 45 nm | 45 nm | 3 |
| 65 nm | 60 nm | 135 nm | 3.5 |
| 90 nm | 90 nm | 90 nm | 3.5 |
| 130 nm | 120 nm | 160 nm | 4 |

## Best-case noise margins scenario

No process variation In the best-case noise margin scenario, $V_{T H}$ is given by Eq. (3.6) and Eq. (3.7). We extracted the DC parameters of the inverter $V_{m}, V_{O H}$ and $V_{O L}$ from simulations. In this simulation, we took into account the effect of the subthreshold static currents of the MOS transistor which make the output logic levels move from their ideal values $V_{d d}$ and $g n d$.

Once we obtained the $V_{T H}$, the noise RMS voltage must be calculated. In the best-case noise margin scenario we considered the bandwidth $\left(f_{3 d b}\right)$ of the subsequent inverter as the -3 dB frequency of the AC characteristic of the inverter when biased with $V_{i n}=V_{m}$. This was also extracted from electrical simulations. Fig. 3.2 shows the setup used for the bandwidth characterization. The center inverter is the one under test.


Figure 3.2: Inverters bandwidth characterization setup.
The first, shortcircuited, inverter allows us to bias the test inverter at $V_{m}$, while the last one sets the load. Therefore, the bandwidth was extracted as the

### 3.1. Minimum operating voltage in subthreshold digital logic

-3 dB frequency of $H(f)=V_{\text {out }} / V_{\text {in }}$ (Fig. 3.2). To verify the assumption that the bandwidth biased in this point was an acceptable estimation for the filtering characteristic of the subsequent gate, we performed simulations with a square input signal that just crosses $V_{m}$. Then, by varying the frequency of this signal we saw that for frequencies higher than this bandwidth, the subsequent gates would filter the input while for lower frequencies a bit flip would occur in a following gate of the chain.

Fig. 3.3 shows the simulation results of the noise RMS voltage as a function of $V_{d d}$ for all the considered technology nodes. In this case, simulations were performed with a temperature of $120^{\circ} \mathrm{C}$ and with an input low logic level. The figure shows the same trends pointed out in previous works, that at nominal $V_{d d}$, the noise RMS voltage increases with technology scaling. However, since we considered the filtering capability of the subsequent gate, our analysis reduces by $50 \%$ the noise RMS voltage prediction in comparison with previously published works [75-77].


Figure 3.3: Noise RMS voltage vs $V_{d d}$ for different technology nodes. $T=120^{\circ} \mathrm{C} . V_{i n}=V_{O L}$.

When $V_{d d}$ is decreased, the bandwidth of the inverter decreases and so does the noise RMS voltage. This holds until the increase of the noise PSD, due to the increase of the on-resistance of the transistor, makes the noise RMS voltage start to increase again in the subthreshold region.

Finally, to evaluate the failure rate as a function of $V_{d d}$, the relationship S from Eq. (3.5) needs to be calculated. The simulation results of $S$ as the supply voltage is varied are shown in Fig. 3.4 for the different technology nodes. As it can be seen in the figure, at the nominal supply voltage, the failure rate is very small ( $S \gg 9$ ) and thus making intrinsic noise not a problem, at least until 16

## Chapter 3. Subthreshold digital circuits modelling

nm and without considering variability. On the contrary, while decreasing $V_{d d}, \mathrm{~S}$ decreases dramatically when entering the subthreshold region. This was to expect as the noise RMS voltage increases in the subthreshold region while $V_{T H}$ decreases since $V_{m}$ decreases and the output logic levels move from their ideal values. It is possible to determine a minimum operating voltage that maintains a ratio $S>9$. But to do that, variability effects must be considered in the noise analysis.


Figure 3.4: $S=V_{T H} / V_{N}$ vs $V_{d d}$ for different technology nodes. $T=120^{\circ} \mathrm{C}$. $V_{i n}=V_{O L}$.

Process variation To consider variability effects in the intrinsic noise analysis, we used the PTM corner model files. For the time being corner model files are only available down to the 32 nm technology node.

We use the same noise characterization analysis, but considering the variability effects in the noise PSD, bandwidth, $V_{m}, V_{O H}$ and $V_{O L}$. In Fig. 3.5 the noise RMS voltage simulated for the 32 nm node with an input low logic level is shown with the respective corners. It can be seen that the FS corner increases the noise RMS since the input voltage is low making the noise PSD to increase due to the increase of the on-resistance of the PMOS transistor. In the same way, the SF corner decreases the noise RMS voltage.

Additionally, as it was pointed out in [33], variability is worsened with technology scaling which reduces the static noise margins. This is even worst in the subthreshold region due to the exponential dependence of the transistors current with the transistor threshold voltage. Due to this, the S ratio is strongly diminished due to variability effects which affect both the noise RMS voltage and the static noise margins. This is depicted in Fig. 3.6 which also shows that even in the SF corner where the noise RMS is reduced the $S$ ratio decreases too. Consequently,
3.1. Minimum operating voltage in subthreshold digital logic


Figure 3.5: Noise RMS voltage vs $V_{d d}$ for 32 nm with corner simulations. $T=120^{\circ} \mathrm{C}$. $V_{i n}=V_{O L}$. Best-case noise margin analysis.
the worst case is with the FS corner when the input is a low logic level and with the SF corner when the input is a high logic level.


Figure 3.6: $S=V_{T H} / V_{N}$ vs $V_{d d}$ for 32 nm with corner simulations. $T=120^{\circ} \mathrm{C} . V_{i n}=V_{O L}$. Best-case noise margin analysis.

Table 3.2 shows the minimum operating voltage of each technology node due to

## Chapter 3. Subthreshold digital circuits modelling

intrinsic noise in the best-case noise margin scenario. We can see that as technology scales, intrinsic noise increases and thus the minimum operating voltage due to intrinsic noise increases too. Further on, we will present a brief discussion on FDSOI 28 nm . But for the time being, we can see that the sizes selected for the width of the transistors in this technology were much bigger than in 32 nm and 45 nm (Table 3.1) which has an impact on the minimum operating voltage.

Table 3.2: Minimum operating $V_{d d}$ due to intrinsic noise in the best-case noise margin scenario.

| Node | Minimum $V_{d d}[\mathrm{mV}]$ |
| :---: | :---: |
| 28 nm | 108 |
| 32 nm | 184 |
| 45 nm | 154 |
| 65 nm | 114 |
| 90 nm | 108 |
| 130 nm | 101 |

## Worst-case noise margin scenario

In the worst-case noise margin scenario, the static noise margins and thus the $V_{T H}$ is given by the size of the maximum square that can be inscribed between the normal and mirrored voltage transfer characteristic [81]. To calculate the noise RMS we also considered a worst-case for the bandwidth of the subsequent inverter. To do that we selected the bias point where the inverter has a gain equal to -1 and the bandwidth as -3 db frequency of $H(f)=V_{\text {out }} / V_{\text {in }}$. This is a worst-case since the bandwidth is given by an on resistance instead of the output resistance of saturated transistors and thus lower resistance and higher bandwidth. This approach is similar to the one used by Natori et al. in [75]. In Fig. 3.7 the bandwidth in the best-case noise margin scenario and the worst-case noise margin scenario are shown for the 32 nm node.

To obtain the worst-case noise margin scenario minimum $V_{d d}$ due to intrinsic noise we considered these definitions for the static noise margins and bandwidth, and repeated the noise characterization analysis. Table 3.2 shows the minimum operating voltage of each technology node due to intrinsic noise in this case. We can see the same trends as in the best-case noise margin scenario, as technology scales, intrinsic noise increases and thus the minimum operating voltage due to intrinsic noise increases too. However, in the worst-case noise margin analysis the minimum operating voltage due to intrinsic noise is greatly increased since the noise RMS voltage increases. This is because we selected a higher bandwidth

### 3.1. Minimum operating voltage in subthreshold digital logic



Figure 3.7: Bandwidth in the Best-Case scenario and in the Worst-Case scenario for the 32 nm process.
for the subsequent gate and smaller static noise margins are obtained with the worst-case definition.

Table 3.3: Minimum operating $V_{d d}$ due to intrinsic noise in the worst-case noise margin scenario.

| Node | Minimum $V_{d d}[\mathrm{mV}]$ |
| :---: | :---: |
| 28 nm | 161 |
| 32 nm | 251 |
| 45 nm | 205 |
| 65 nm | 154 |
| 90 nm | 146 |
| 130 nm | 133 |

## Discussion

In previous works, other definitions for minimum operating voltage have been proposed. Bol et al. in [83] used the worst-case definition for static noise margins and considered the minimum operating voltage when this static noise margins are 50 mV . Calhoun et al. in [34] used the same definition for static noise margins

## Chapter 3. Subthreshold digital circuits modelling

but considered the minimum operating voltage when these reach $0.1 \times V_{d d}$. On the other hand, Wang et al. in [25] used a different approach. They defined the minimum operating voltage when the output logic levels are $10 \%$ shifted from their ideal values $V_{d d}$ and gnd. In Fig. 3.7 we show a comparison between the different criteria for the minimum operating voltage. The figure shows that previously proposed criteria might be optimistic and by using those criteria intrinsic noise could cause undesirable bit flips. We can see that as pointed out in previous works, the minimum operating voltage is greatly increased due to variability effects in digital nanoscale CMOS but as we showed in this work this is also worsened due to the increased intrinsic noise in gates implemented with smaller transistors in advanced technologies.


Figure 3.8: Comparison between different definitions of minimum operating voltage.
Fig. 3.8 also shows the optimum $V_{d d}$ where the Minimum Energy Point (MEP) is achieved by a chain of 27 inverters. It can be seen that until 65 nm node the optimum $V_{d d}$ is higher than the minimum operating voltage due to intrinsic noise, however, from the 45 nm node down, the MEP might not be reached and thus energy could be worsened due to intrinsic noise. Of course, as Wang et al. showed in [22] a circuit with higher activity factor achieves minimum energy consumption in a lower supply voltage and in our case the activity factor is equal to one. Several solutions have been proposed to handle variability issues in nanoscale technologies and thus operate at lower supply voltages. Kwong et al. in [31] proposed an upsizing technique to reduce variability effects and achieve lower operating voltages when the minimum energy point is below the minimum operating voltage.

In Fig. 3.9 the results of the presented noise characterization analysis are shown for the worst-case noise margins analysis in the 32 nm node and with a

### 3.1. Minimum operating voltage in subthreshold digital logic

$W_{p} / W_{n}=4.5$. In this case, the minimum operating voltage due to intrinsic noise is 217 mV . The upsizing technique reduces variability effects and also reduces the noise RMS voltage since a lower on-resistance is achieved which reduces the noise PSD. In conclusion, for technology nodes smaller than 45 nm we showed that minimum energy operation could be limited due to intrinsic noise and thus techniques that reduce intrinsic noise and variability must be used.


Figure 3.9: $S=V_{T H} / V_{N}$ vs $V_{d d}$ for 32 nm with corner simulations. $T=120^{\circ} \mathrm{C}$. $V_{i n}=V_{O L}$. $W_{p} / W_{n}=4.5$. Worst-case noise margins analysis.

To support the noise characterization analysis proposed in this work we did transient noise simulations of two back to back inverters. Fig. 3.10 shows the output of one of the two back to back inverters and the same signal after a chain of noiseless inverters for the 32 nm node considering the FS corner and with a supply voltage of 150 mV . In this particular case, we run a 1 ms simulation which presented 100 bit-flips. Additionally, using Eq. (3.4) and the noise characterization analysis proposed we obtained the estimated failure rates which are 30 bit flips using the best-case noise margins analysis and 6 k bit flips using the worst-case noise margin analysis. This result suggests that the minimum operating voltage due to intrinsic noise is between the worst-case and best-case analysis proposed in this work.

Finally, it is interesting to highlight the advantages that FD-SOI presents with respect to bulk CMOS. As it was pointed out by Vitale et al. in [39], FD-SOI presents a near ideal subthreshold slope which increases the ratio between the on and off currents of the transistor in the subthreshold region which makes the on resistance of transistor lower and thus reducing intrinsic noise. Additionally, the better subthreshold slope reduces the variability effects in the subthreshold

Chapter 3. Subthreshold digital circuits modelling


Figure 3.10: Transient noise simulation. 32 nm node. $V d d=150 \mathrm{mV}$. FS corner.
region and thus increases the noise margins. Due to these, the minimum operating voltage due to intrinsic noise is reduced as it is shown in Fig. 3.8. It is also fair to note that due to the minimum allowed width of the process design kit, 28 nm is upsized in comparison with 32 nm where the width of the transistors is smaller.

### 3.2 Minimum Energy Point (MEP) model

As it was mentioned in Chapter 1, when trying to minimize the energy consumption of digital circuits, there is an optimum $V_{d d}$ that achieves the minimum energy consumption per operation. This is because while lowering the $V_{d d}$, the dynamic energy is reduced but the leakage energy is increased. The later is because although the leakage power is lowered, the time to do the operation increases. This point where minimum energy is achieved is called Minimum Energy Point (MEP).

There are several works that demonstrate the existence of this point. One of the first ones is [25], where an FFT processor was specially designed for subthreshold operation and measurements were carried out. It achieves the MEP with a supply voltage of 350 mV while consuming $155 \mathrm{~nJ} / \mathrm{FFT}$ at a clock frequency of 10 kHz . The technology used was a 180 nm CMOS process. In [26], a processor was measured in a 130 nm technology. The minimum energy consumption is achieved at 360 mV which accounts for $2.6 \mathrm{pJ} /$ inst at a clock frequency of 833 kHz . Then in [28], another processor was designed in a 180 nm technology. In this case, the MEP is achieved at 500 mV consuming $2.8 \mathrm{pJ} /$ cycle at a clock frequency of 106 kHz . [27] is one of the first works to show SRAM functioning in the subthrehold

### 3.2. Minimum Energy Point (MEP) model

regime with more than one order of energy reductions in comparison with nominal above the threshold operation.

In [84], the authors present a System on Chip (SoC) consisting on a microcontroller with a dc-dc converter to enable subthreshold operation. The MEP occurs at a supply voltage of 500 mV while consuming $27.2 \mathrm{pJ} /$ cycle at a clock frequency of 434 kHz . In this case, a 65 nm CMOS process was used. Another example is presented in [29], where an FFT processor is designed in a 65 nm technology. It achieves the MEP with a supply voltage of 270 mV while consuming $15.8 \mathrm{~nJ} /$ FFT at a clock frequency of 30 MHz . In [85] a Multiply-Accumulate Block (MAC) is designed in a 28 nm FD-SOI technology. It achieves the MEP with a supply voltage of 250 mV and consumes $0.17 \mathrm{pJ} /$ operation at a frequency of 35 MHz .

In conclusion, the existence of the MEP has been extensively demonstrated. Previous works also show that technology scaling improves the performance at this point. They also show that depending on the circuit the optimal supply voltage changes, however, the MEP is usually achieved in the subthreshold region.

### 3.2.1 Classic MEP model

In [86], the authors present one of the first models for the energy per operation consumption near the MEP. Based on the exponential dependence of the drain current in the subthreshold region, a simple model is derived which provides insight into the dependences of the energy consumption. They show an analytical expression for the optimum $V_{d d}$ and the total energy per operation as it can be seen in Eq. (3.8).

$$
\begin{equation*}
E_{T}=V_{d d}^{2}\left(C_{e f f}+W_{e f f} K C_{g} L_{D P} e^{\frac{-V_{d d}}{n U_{T}}}\right) \tag{3.8}
\end{equation*}
$$

Here, $C_{\text {eff }}$ is the effective capacitance of the circuit. This is the mean capacitance that is switched in each clock cycle. Sometimes it is defined as $C_{e f f}=\alpha C_{T}$, where $\alpha$ is called the activity factor and $C_{T}$ the total capacitance of the circuit. The activity factor is the mean fraction of the total capacitance that is switched in a clock cycle. $W_{e f f}$ is the effective width, relative to the characteristic inverter, that contributes to leakage. It can be seen as the ratio between the leakage energy of the circuit and leakage energy of the characteristic inverter of the technology. $K$ is a delay fitting parameter. $C_{g}$ is the load capacitance of the characteristic inverter. $L_{D P}$ is the logic depth of the circuit. This is the number of characteristic inverters that have the same delay as the critical path of the circuit. $U_{T}$ is the thermal voltage and $n$ the subthreshold slope.

Eq. (3.8) shows that when considering this simple model the energy per cycle is independent of the $V_{T}$ of the transistor, which means that theoretically by tuning the $V_{T}$ of the transistor, different performances can be obtained while consuming the same energy. Of course, this is true while considering $V_{T}>V_{d d}$. However, this is not the practical case, mainly due to two reasons.

## Chapter 3. Subthreshold digital circuits modelling

The first reason, as it was deeply studied in [87], using different technology flavours (for example general purpose or low power devices) or global $V_{T}$ selection available in the process (most nanometer technologies have two $V_{T}$ devices in each flavour), has an impact on other characteristic of the device like capacitance or subthreshold slope that change the minimum energy achieved. The authors also show that using body biasing to change the $V_{T}$ of the device has a negative influence in the energy consumption due to the same reason.

The second reason is that in this model, the same $V_{T}$ and $n$ value is considered for both NMOS and PMOS transistors. In consequence, the impact of the difference in these parameters is not taken into account. In $[1-4,88-90]$ it was shown that the imbalance between the NMOS and PMOS transistors leakage currents has an impact on the leakage energy consumed by subthreshold circuits. However, in [88-90] the authors did not take into account the particular circuit topology and thus did not achieve the optimum imbalance between the NMOS and PMOS transistors that minimizes energy. In the next section, we propose a new model that takes into account circuit topology and input data as well as the diferences between the NMOS and PMOS devices, which demonstrates that the optimum imbalance from the energy point of view changes from circuit to circuit [1-4].

### 3.2.2 Improved MEP model

In this section, we show a simple model for the total energy per operation consumed by CMOS digital circuits operating near the MEP. We used a transistor model valid for subthreshold operation (weak inversion region) since in general the MEP is achieved in this region. Additionally, we took into account the differences between the NMOS and the PMOS transistors and we show that these differences have an important impact on the energy consumed in the MEP.

Eq. (3.9) shows the conventional model for the subthreshold current where $I_{L}$ was conveniently defined as the OFF current of the transistor. In this case, $n$ stands for the subthreshold slope factor, $V_{T}$ the threshold voltage and $U_{T}$ the thermal voltage. In each case with an n or p subindex it is indicated if the parameter corresponds to an NMOS or PMOS transistor.

$$
\begin{equation*}
I_{s u b, n(p)}=I_{o, n(p)} e^{\frac{V_{G S(S G)}-V_{T n}(p)}{n_{n}(p)^{U} T}}=I_{L, n(p)} e^{\frac{V_{G S(S G)}}{n_{n(p)} U_{T}}} \tag{3.9}
\end{equation*}
$$

We used a basic model for the delay of a gate which is shown in Eq. (3.10). Here $C_{g}$ is the load capacitance of the gate and $K$ is a fitting parameter.

$$
\begin{equation*}
t_{d}=\frac{K C_{g} V_{d d}}{\left.I_{s u b}\right|_{V_{G} S}=V_{d d}}=\frac{K C_{g} V_{d d}}{I_{L, n(p)}} e^{\frac{-V_{d d}}{n_{n(p)} U_{T}}} \tag{3.10}
\end{equation*}
$$

The dynamic energy consumed by the circuit per operation can be calculated with Eq. (3.11), where $C_{e f f}$ is the same parameter used in the classic MEP
modelling.

$$
\begin{equation*}
E_{D}=V_{d d}{ }^{2} C_{e f f} \tag{3.11}
\end{equation*}
$$

As for the leakage energy, we included in our model the differences between the NMOS and PMOS transistors. Equation (3.12) shows the proposed model for the leakage energy. $W_{e f f n(p)}$ is an estimation of the average width of NMOS (PMOS) transistors, relative to the characteristic inverter, that contributes to leakage. They are the same concept as $W_{\text {eff }}$ in the classic model but differentiating when the leakage is contributed by an NMOS or PMOS device. $\tau_{n(p)}$ is the number of inverters delay in the critical path that depend on a NMOS (PMOS) transistor. They are the same concept as the logic depth $\left(L_{D P}\right)$ in the classic model. $W_{e f f n(p)}$ depends on which transistor (NMOS or PMOS) defines the leakage current. This depends on the circuit architecture and on the value at the inputs. Similar is the case of $\tau_{n(p)}$, which depends on how many node transitions are driven by an NMOS (PMOS) in the critical path. Therefore these values depend on the circuit architecture as well as on the inputs values.

$$
\begin{equation*}
E_{L}=V_{d d}\left(W_{e f f n} I_{L, n}+W_{e f f p} I_{L, p}\right)\left(\tau_{n} t_{d, n}+\tau_{p} t_{d, p}\right) \tag{3.12}
\end{equation*}
$$

For example, Fig. 3.11 shows the parameters $W_{e f f n(p)}$ and $\tau_{n(p)}$ in a simple circuit that consists in a chain of 5 inverters. We can see that $W_{\text {effn(p) }}$ depend on the inputs of the circuit in a certain time. On the other hand, $\tau_{n(p)}$ depend on the transition of the inputs that activates the critical path and thus depend only on the architecture of the circuit and not on the inputs. Further on we will continue analysing these dependencies, in particularly, in a Ripple Carry Adder (RCA) (Section 4.5).

If we consider that the subthreshold slope factors of both transistors are similar, $n \approx n_{p} \approx n_{n}$, and using Eq. (3.10) and Eq. (3.12), the leakage energy can be written as in Eq. (3.13). Equation 3.12 depends on the total leakage current and the total delay of the circuit. Each one of these (total leakage current and the total delay), has two terms, one that depends on the NMOS parameters and the other in the PMOS parameters. In particular, the leakage current terms they are obviously directly proportional to the leakage current of each transistor while the delay terms are inversely proportional to the leakage current of each transistor. LF in Eq. (3.13) reflects these dependencies in the multiplication of the total leakage current and the total delay in the leakage energy, Eq. (3.12).

$$
E_{L}=V_{d d}^{2} K C_{g} e^{\frac{-V_{d d}}{n U_{T}}} L F
$$

where

$$
\begin{equation*}
L F=\tau_{n} W_{e f f n}+\tau_{p} W_{e f f p}+\tau_{n} W_{e f f p} \frac{I_{L, p}}{I_{L, n}}+\tau_{p} W_{e f f n} \frac{I_{L, n}}{I_{L, p}} \tag{3.13}
\end{equation*}
$$

## Chapter 3. Subthreshold digital circuits modelling



Figure 3.11: Example of $W_{e f f n(p)}$ and $\tau_{n(p)}$ in a simple chain of inverters.

The model of Eq. (3.12) and Eq. (3.13) is an extension of the simple model applied for analysis of the MEP in several works, e.g. [27,86,88]. Here, the model includes the impact of the different leakage components due to the PMOS and NMOS leakage paths and also the circuit topology through $W_{e f f n}, W_{e f f p}, \tau_{n}$ and $\tau_{p}$.

The total energy can be obtained adding Eq. (3.13) and Eq. (3.11) and it is shown in Eq. (3.14) . Furthermore, the optimum $V_{d d}$ that minimizes the total energy per operation can be obtained by calculating the derivative of the total energy (Eq. (3.15)) and equalizing to zero.

$$
E_{T}=V_{d d}^{2}\left(C_{e f f}+K C_{g} e^{\frac{-V_{d d}}{n U_{T}}} L F\right)
$$

where

$$
\begin{gather*}
L F=\tau_{n} W_{e f f n}+\tau_{p} W_{e f f p}+\tau_{n} W_{e f f p} \frac{I_{L, p}}{I_{L, n}}+\tau_{p} W_{e f f n} \frac{I_{L, n}}{I_{L, p}}  \tag{3.14}\\
\frac{\partial E_{T}}{\partial V_{d d}}=2 V_{d d} C_{e f f}+2 V_{d d} L F C_{g} K e^{\frac{-V_{d d}}{n U_{T}}}+\frac{-V_{d d}^{2} L F C_{g} K}{n U_{T}} e^{\frac{-V_{d d}}{n U_{T}}} \tag{3.15}
\end{gather*}
$$

### 3.3. Conclusions

Eq. (3.16) shows the optimum $V_{d d}$ that minimizes the total energy, where lambert $W$ is the Lambert function which gives the solution to $x e^{x}=\beta$. Since this function is monotonically increasing, two conclusions can be drawn. First, if the activity factor, and thus $C_{e f f}$, is lower, the optimum $V_{d d}$ is higher. Also, it can be seen that minimizing LF brings the optimum $V_{d d}$ to a lower value.

$$
\begin{array}{r}
V_{D D_{\text {opt }}}=n U_{T}(2-\text { lambert } W(\beta)) \\
\text { where } \beta=\frac{-2 C_{e f f}}{L F C_{g} K} e^{2} \tag{3.16}
\end{array}
$$

From Eq. (3.13) and Eq. (3.11) we can see that the total energy depends on the imbalance between the NMOS and PMOS leakage currents represented in the equation by $I_{L, n} / I_{L, p}$. To find the optimum imbalance that minimizes the total energy per operation LF must be minimized. Eq. (3.17) shows the derivative of the total energy with respect to the imbalance and Eq. (3.18) shows the optimum imbalance.

$$
\begin{gather*}
\frac{\partial E_{T}}{\partial \frac{I_{L, n}}{I_{L, p}}}=K C_{g} e^{\frac{-V_{d d}}{n U_{T}}}\left(W_{e f f n} \tau_{p}-W_{e f f p} \tau_{n} \frac{1}{\left(\frac{I_{L, n}}{I_{L, p}}\right)^{2}}\right)  \tag{3.17}\\
\left(\frac{I_{L, n}}{I_{L, p}}\right)_{o p t}=\sqrt{\frac{W_{e f f p} \tau_{n}}{W_{e f f n} \tau_{p}}} \tag{3.18}
\end{gather*}
$$

The optimum imbalance depends on the architecture and the state of the circuit through $W_{\text {eff }}$ and $\tau$. Additionally, it is easy to see that if we want to consider the differences between the subthreshold slope of each device, the new optimum imbalance can be calculated by Eq. (3.19). Further on we will discuss when it is important to consider this difference.

$$
\begin{equation*}
\left(\frac{I_{L, n}}{I_{L, p}}\right)_{o p t}=\sqrt{\frac{W_{e f f p} \tau_{n}}{W_{e f f n} \tau_{p}} e^{\frac{V_{d d}}{U_{T}}\left(\frac{1}{n_{p}}-\frac{1}{n_{n}}\right)}} \tag{3.19}
\end{equation*}
$$

### 3.3 Conclusions

In this chapter, we analyzed two aspects of the modeling of subthreshold digital circuits. First, we studied the minimum operating voltage of digital circuits and developed a noise aware methodology to characterize it. To summarize, we can state that there is a minimum operating voltage due to intrinsic noise and that previously proposed definitions of minimum operating voltage can be optimistic when comparing it to intrinsic noise aware minimum operating voltage, particularly, for process with channel length below 45 nm . We also showed that this minimum operating voltage is higher as technology shrinks not only due to the

## Chapter 3. Subthreshold digital circuits modelling

increased variability effects pointed out in previous works but also due to an increase in intrinsic noise in gates implemented with smaller transistors in advanced technologies.

We also proposed a model for the MEP which takes into account the differences between the NMOS and PMOS devices. Due to this, we could see that there is an optimum imbalance between the leakage currents of the devices that minimizes the leakage energy. This optimum imbalances also reduces the optimum $V_{d d}$ which also reduces the dynamic energy. The optimum imbalance depends on the circuit topology and the state of the circuit. In the next chapter, we will present the proposed techniques in order to force the circuit to operate in this optimum leakage current imbalance and the energy savings that can be obtained.

## Chapter 4

## Subthreshold energy reduction techniques

In this chapter, we present two techniques developed for energy reduction in sub/near threshold digital circuits. They consist in achieving the optimum imbalance presented in the previous chapter which reduces the energy consumption per operation in the MEP. The first one involves using the back plane voltage or bulk voltage to make the circuit work under the optimum imbalance. We refer to this technique as Optimum Back Plane Biasing (OBB) [2,3]. The second technique uses the length of the devices to achieve the optimum imbalance. We called this technique Asymmetric Length Biasing (ALB) [1,4]. To demonstrate the benefits of using the techniques we performed simulations in a 28 nm FD-SOI for two simple circuits. We compared these results with classic biasing and sizing techniques used in subthreshold digital design.

In recent years, similar works have been published that use some of these techniques or related ones [33, 90-95]. In [90], an asymmetric back plane scheme is presented while using standard cells designs. They use this scheme to adjust rise and fall times of the gates in the subthreshold regime and show that this has an impact in the leakage energy. In [95], the authors also use an asymmetric back plane biasing to adjust the switching voltage of an inverter to $V_{d d} / 2$. They show very good energy reductions in a MAC circuit. Additionally, in [33] and [92], the authors proposed to upsize the length of the devices to reduce the leakage power and energy of subthreshold digital circuits. In [93,94], the authors also propose to use an Asymmetric Length Biasing (ALB) to reduce leakage energy and show several adders circuits with important energy reductions. However, neither of these works analyse the optimum imbalance described in the previous chapter. In [91], after our publications, the authors arrived to the same equations as the ones developed in the previous chapter and also propose to use the same technique of Optimum Back Plane Biasing (OBB) to work in the optimum.

## Chapter 4. Subthreshold energy reduction techniques

The chapter is organized as follows. First, in Section 4.1 a brief introduction to the technology used is presented. Then, in Section 4.2 the first energy reduction technique ( OBB ) is presented and in Section 4.3 the second one is described (ALB). Then in Section 4.4, a comparison between the two techniques and classic approaches are shown. Further on, in Section 4.5, the impact of these techniques in a Ripple Carry Adder (RCA) is analysed. Finally, in Section 4.6 the chapter conclusions are drawn.

### 4.128 nm Ultra-Thin Body and Box (UTBB) Fully Depleted Silicon on Insulator (FD-SOI) technology

FD-SOI technology has become the preferred alternative for ULP digital systems. In [35], a microprocessor was implemented together with a dc-dc converter in a 28 nm FD-SOI technology. The system achieves a wide range of Dynamic Voltage and Frequency Scaling (DVFS) and a high-efficiency during the voltage conversion. In $[36,37]$, a dual-core ARM Cortex-A9 was manufactured and measured in the same technology. They also show a wide range of DVFS and the body biasing opportunities for low-voltage circuits.

This high energy efficiency is achieved due to several reasons. The thin silicon film allows improving several electrostatic characteristics. The subthreshold slope is improved while the Drain-Induced Barrier Lowering (DIBL) and the parasitic capacitances (such as junction capacitances) are reduced [36,39,40]. Additionally, this technology does not need highly doped or pocket implants to tune the electrostatic characteristics. In consequence, due to the lightly doped body, the variability is greatly decreased making it suitable for ultra-low voltage circuits. [39-41]

An important knob of this technology is the poly biasing technique. This consists on modulating the gate length which offers circuit designers a wide range of static power optimization. Additionally, in subthreshold circuits, this is an important knob for energy reduction. We will address this discussion further on.

Moreover, the solution proposed by [42] for multi-threshold voltage ( $V_{T}$ ) transistors in the so-called Ultra-Thin Body and Box (UTBB) Fully Depleted Silicon on Insulator (FD-SOI), opens a new degree of freedom by allowing an wide range of Back Plane Biasing (BPB) which can be used for fine tuning the $V_{T}$ of the transistors. It can be seen as a "double gate" transistor since the back plane voltage is applied through a thin buried oxide. The application of BPB in digital circuits has been proposed to manage the trade-off between leakage current and performance $[37,38,96,97]$.

We used a 28 nm UTBB FD-SOI technology which has two $V_{T}$ flavors. A regular $V_{T}$ transistor (RVT) (Fig. 4.1a) where the back planes are implemented with conventionally doped wells ( p -well for the NMOS and n -well for the PMOS) and low $V_{T}$ transistors (LVT) (Fig. 4.1b) where the wells are flipped [37]. The

### 4.2. Optimum Back Plane Biasing (OBB)

oxide of the SOI technology allows an ultra wide range of Back Plane Biasing ( BPB ) since the only limitation is the diodes between the wells and between the well and the substrate, see Fig. 4.1. Considering the conditions for turning on these diodes, the voltage range that can be applied to each device can be found. Eq. (4.1) shows the range of voltage that can be applied to the RVT devices while Eq. (4.2) for the LVT devices. The 3V restriction is given by the breakdown of the device but we will consider the range only up to 1.5 V since the models are validated up to that voltage.

$$
\begin{gather*}
\text { RVT: } V B P_{p}=V_{d d}-V_{B} \quad \text { and } \quad V B P_{n}=V_{B} \\
V_{B} \in\left[-3 V, 0.3 V+\frac{V_{d d}}{2}\right]  \tag{4.1}\\
\text { LVT: } V B P_{p}=-V_{B} \quad \text { and } \quad V B P_{n}=V_{B} \\
V_{B} \in[-0.3 V, 3 V] \tag{4.2}
\end{gather*}
$$


(a) RVT devices.

Figure 4.1: Devices in the 28 nm UTBB FD-SOI technology used. [36]

### 4.2 Optimum Back Plane Biasing (OBB)

In this section, we present the Optimum Back Plane Biasing (OBB) scheme proposed. We compare this scheme with classic Symmetric Back Plane Biasing (SBB) to demonstrate the energy reductions that can be obtained.

### 4.2.1 Back Plane Biasing Schemes

Since we worked with a 28 nm UTBB FD-SOI technology we will refer to the back plane voltage, but the same results can be applied to a bulk technology considering

## Chapter 4. Subthreshold energy reduction techniques

the appropriate voltage range for the body voltage.
To adjust the performance of the circuit we can use the back plane voltage of the devices to adjust the $V_{T}$ of the transistor. To maintain the minimum energy per operation, the $V_{T}$ must be adjusted keeping the optimum imbalance between the leakage current of the NMOS and PMOS. By doing this we can obtain a wide range of performances with a constant energy consumption, since from Eq. (3.13) we see that the energy only depends on the $V_{T}$ through the imbalance of the leakage currents.

To tune the $V_{T}$ properly we need to find a relationship between the back plane voltage and the imbalance. Eq. (4.3), which is derived from Eq. (3.9), shows the relationship between the imbalance and the transistors threshold voltage while Eq. (4.4) shows the model used for the $V_{T}$ modulation through the back plane voltage.

$$
\begin{align*}
& \frac{I_{L, n}}{I_{L, p}}=\frac{I_{o, n}}{I_{o, p}} e^{\frac{-V_{T n}}{n_{n} U_{T}}+\frac{V_{T p}}{n_{p} U_{T}}}  \tag{4.3}\\
& V_{T}=V_{T 0}-\gamma V_{B S}-\eta V_{D S} \tag{4.4}
\end{align*}
$$

With these two relationships we can find the Optimum Back Plane Biasing (OBB) scheme that maintains the imbalance constant, in the optimum, while the $V_{T}$ is tuned to adjust the performance of the circuit. Eq. (4.5), derived from Eq. (4.3) and Eq. (4.4), shows the relationship that must satisfy the $V_{B P n}$ and $V_{B P p}$ to maintain a constant imbalance.

$$
\begin{align*}
& V_{B P p}-V_{d d}=-k_{i m b} V_{B P n}-V_{i m b} \text { where } \\
& k_{i m b}=\frac{\gamma_{n} n_{p}}{\gamma_{p} n_{n}}  \tag{4.5}\\
& V_{i m b}=\frac{V_{T 0 n} n_{p}-V_{T 0 p} n_{n}}{\gamma_{p} n_{n}}+V_{d d} \frac{\eta_{p} n_{n}-\eta_{n} n_{p}}{\gamma_{p} n_{n}}+\frac{n_{n} n_{p}}{\gamma_{p} n_{n}} U_{T} \ln \left(\frac{I_{L p} I_{o n}}{I_{L n} I_{o p}}\right)
\end{align*}
$$

On the other hand, we have the classic SBB where both the PMOS and NMOS are symmetrically forward biased (FBB) or reverse biased (RBB). In consequence the back plane voltages are related through Eq. (4.1) when the devices are RVT and Eq. (4.2) when they are LVT.

Fig. 4.2 summarizes both back plane biasing schemes where $V B$ will be used to adjust the circuit performance as shown in each scheme. In the next two sections we will show simulation results for both back plane biasing schemes and quantify the energy benefits obtained by using the OBB scheme instead of the SBB scheme.

### 4.2.2 Simulation results

We evaluated the impact of the OBB scheme technique in LVT, RVT, full custom and standard cell designs to fully confirm the results predicted by the model.
4.2. Optimum Back Plane Biasing (OBB)

(a) Classic symmetric back plane biasing scheme. FBB for $V B>0, \mathrm{RBB}$ for $V B<0$.

(b) Optimum back plane biasing scheme.

Figure 4.2: Back plane biasing schemes.

We also evaluated the energy benefits by using the OBB scheme in comparison with the SBB. The test circuit was an inverter chain, which is a simple circuit, yet representative of the performance trade-offs of more complex circuits. A 25 inverter chain with an activity factor of 0.5 (i.e two 25 inverter chain, one switching with just in time operation frequency (maximum frequency) and the other one with a fixed input) was chosen (Fig. 4.3).


Figure 4.3: Chain of inverters with an activity factor of 0.5 .

## Chapter 4. Subthreshold energy reduction techniques

## Optimum leakage current imbalance with OBB

First, we will show simulation results for the standard SBB scheme where both the PMOS and NMOS are symmetrically forward biased (FBB) or reverse biased (RBB). The test circuit with minimum size $\left(W_{p}=W_{n}=80 n m\right.$ and $L_{p}=L_{n}=$ 30 nm ) LVT devices was simulated for different $V_{d d}$ and back plane voltages ( $V B P$ ) (the SBB implies $V B P n=-V B P p$ in LVT devices). Fig. 4.4 shows the simulated energy per operation (normalized to minimum energy achieved with the OBB) with a solid line contour and the current imbalance of the devices in dashed grey line contour.


Figure 4.4: $V B P n=-V B P p$. LVT devices. Energy per operation normalized to minimum energy of the OBB as a function of supply voltage and back plane voltage (solid contour). Imbalance between PMOS and NMOS leakage current as a function of supply voltage and back plane voltage (dashed contour).

The optimum $V_{d d}$ value is directly dependent on the activity factor of the circuit which changes the dynamic power consumption. More complex circuits with lower activity factor would achieve the MEP at higher $V_{d d}$ values. Nevertheless, the conclusions of our work will remain unchanged in that case. Additionally, it can be seen that the optimum $V_{d d}$ also depends on the leakage current imbalance as it was seen in Eq. (3.16). As the imbalance is higher, the optimum $V_{d d}$ is higher and the energy per operation increases.

Since the $k_{i m b}$ (parameter from Eq. (4.5)) is not taken into account, the imbalance is not maintained constant. However, as $k_{i m b}$ is close to one the imbalance is not drastically changed through the SBB scheme. What is more important is that the imbalance itself is far away from the optimum imbalance. This increases dramatically the energy consumption. In this particular circuit, $W_{\text {eff }}$ and $\tau$ are

### 4.2. Optimum Back Plane Biasing (OBB)

easy to estimate since almost half of the inverters will be leaking due to a PMOS and half an NMOS. For the inputs used during the simulation we can estimate that $W_{\text {effn }}=26, W_{\text {effp }}=24, \tau_{n}=12$ and $\tau_{p}=13$ which give rise to an optimum imbalance close to 0.9 .

To show that this optimum imbalance is, as the model predicts, the one that minimizes the total energy, we simulated the energy per operation and leakage current imbalance for different supply voltages but, in this case, we maintained the back plane voltage of the NMOS devices connected to gnd and changed the back plane voltage of the PMOS device. Fig. 4.5 shows the simulated energy per operation (solid contour) and leakage current imbalance (dashed contour).


Figure 4.5: $V B P n=0$. LVT devices. Energy per operation normalized to minimum energy of the OBB as a function of supply voltage and back plane voltage (solid contour). Imbalance between PMOS and NMOS leakage current as a function of supply voltage and back plane voltage (dashed contour).

There is only one point for each $V_{d d}$ that has the optimum imbalance (0.9 in this circuit), from Eq. (4.5), that point is $V B P n=0 V, V B P p=V_{d d}-V_{i m b}$. From simulations it can be seen that, in the optimum $V_{d d}=200 \mathrm{mV}$, the optimum back plane voltage is $V B P n=0 V, V B P p=-0.9 \mathrm{~V}$. It also shows, that this point leads to the minimum energy per operation.

To further confirm the existence of this optimum we changed the circuit, thus changing $W_{\text {eff }}$, in order to change the optimum imbalance. To do this we took the 25 inverters idle chain and substituted it by 25 inverters with their input at $g n d$. In this way, the $W_{e f f n}$ was increased while the $W_{e f f p}$ decreased moving the optimum imbalance to 0.5 . Fig. 4.6 shows the simulated energy per operation and leakage current imbalance with this new circuit. It can be seen that the minimum energy is now achieved with this new optimum imbalance.

In Section 4.2.1 we presented the OBB scheme through which we can adjust

## Chapter 4. Subthreshold energy reduction techniques



Figure 4.6: $V B P n=0$. LVT devices. Energy per operation normalized to minimum energy of the OBB as a function of supply voltage and back plane voltage (solid contour). Imbalance between PMOS and NMOS leakage current as a function of supply voltage and back plane voltage (dashed contour). LVT devices modified with respect to previous figures, $W_{\text {effn }}$ and $W_{e f f p} . W_{\text {effn }}=38, W_{\text {effp }}=12$.
the $V_{T}$ and thus the performance of the circuit while keeping a constant imbalance (Fig. 4.2b and Eq. (4.5)). To do this we need to carry out two steps. The first one is to obtain the optimum imbalance. To do that we need to estimate the $W_{e f f}$ and $\tau$ of the circuit through an analysis of the circuits architecture and inputs. This can be done with a logic simulation, further on we will show how to do this in an example. In the test circuit this imbalance was close to 0.9 , see Fig. 4.5.

The second step is to obtain $k_{i m b}$ and $V_{i m b}$ to be able to apply the OBB scheme. To do this we just need to do electrical simulations to one NMOS transistor and one PMOS transistor to find the linear relationship between $V B P n$ and $V B P p$ that keeps the imbalance constant in the optimum.

To confirm that by using this simple approach we obtain the minimum energy per operation, we simulated the same circuit while varying $V_{d d}, V B P n$ and $V B P p$ and calculated the energy per operation. Fig. 4.7 shows the simulation results. We show 3D surfaces of constant energy per operation as a function of $V_{d d}, V B P n$ and $V B P p$. Additionally, we show with a black solid line the linear relationship between $V B P n$ and $V B P p$ obtained with the two steps mentioned above and with a dashed line the relationship used by the classic SBB. We can conclude that by using this methodology we can achieve the minimum energy per operation which is much lower than the one obtained with the SBB. Finally, another advantage is that we can obtain the OBB without doing any electrical simulation of the complete circuit.


Figure 4.7: Energy per operation normalized to minimum energy as a function of supply voltage, back plane voltage of the NMOS and back plane voltage of the PMOS. LVT devices. The black solid line are the points predicted by the OBB to be the optimum from the energy point of view and the dashed line are the points used by the SBB.

## Comparison between SBB and OBB

We evaluated the impact of this OBB scheme with the optimum imbalance in LVT, RVT, full custom and standard cell (SC) designs of the test circuit already presented. Additionally, we simulated the same circuit with the classic SBB to quantify the energy benefits of applying the OBB scheme. Fig. 4.8 shows the energy per operation (normalized to the minimum energy achieved with full custom cell (minimum size transistors) and LVT devices) as a function of the maximum frequency of the chain.

The back plane voltages were limited so that the diode between the p-well, n-well and p-substrate never turned on and also limited the maximum back plane voltages to $\pm 1.5 \mathrm{~V}$, although the technology states that a $\pm 3 \mathrm{~V}$ is acceptable. Additionally, in each case $k_{i m b}$ and $V_{i m b}$ where simulated for the optimum imbalance to satisfy Eq. (4.5) while tuning the $V_{T}$.

Fig. 4.8 highlights that by using the wide voltage range for the back plane, a unique feature available in UTBB FD-SOI technology, the energy per operation of a circuit can be maintained minimum and constant through a wide range of performance. From the simulation results we can see that the minimum energy can be reduced around $30 \%$ in the LVT flavour while using the OBB in comparison with the classic SBB scheme.

As for the SC, we see a difference between the RVT and the LVT devices. This

## Chapter 4. Subthreshold energy reduction techniques



Figure 4.8: Minimum energy per operation normalized (to minimum energy obtained with LVT devices, minimum size, OBB scheme) as a function of the maximum frequency of the inverters chain for different $V_{T}$ flavours and sizes. The $V_{d d}$ corresponds for each case to the one of the minimum energy point. 1k Monte Carlo simulation results are included for some points, showing the improvement in variability.
is because the dimensioning of the LVT SC (the width of the PMOS is bigger than de NMOS) leads to a $k_{i m b}$ close to 1 and a $V_{i m b}$ close to 0 V making the OBB and SBB nearly the same. Of course, with a custom sizing we can achieve the same frequencies and the same strength but with lower energy consumption by using the OBB scheme.

In [33] it was shown how a moderate increase in the length of the devices can improve the subthreshold slope and DIBL while working in the subthreshold regime. In our circuit, an optimum, upsized, L value exists that further decreases energy consumption in $25 \%$. This is shown in Fig. 4.8 where the simulation results of the test circuit with an optimum upsized $L=50 \mathrm{~nm}$ are depicted. L can also be used to adjust performance with small energy penalties.

In the case of the RVT devices with minimum size transistors, the OBB cannot

### 4.3. Asymmetric Length Biasing (ALB)

be applied since the required back plane voltages that balance the leakage currents turn on the diode between the p-substrate and the n-well. However, by using the LVT transistors with an upsized L and OBB we can achieve the same performance of the RVT devices with SBB but with an energy reduction of more than $60 \%$ (see Fig. 4.8).

Finally, Fig. 4.8 shows 1k Monte Carlo energy per operation simulation results for three different devices and BB schemes, LVT with SBB scheme and minimum sizes, LVT with OBB and minimum sizes and LVT with OBB and the optimum L. The variability is improved by using the OBB scheme which is inherent from working in the optimum imbalance that makes energy less sensitive to variations in the imbalance. Also by upsizing L, we improve the subthreshold slope factor which reduces the derivative of the energy with respect to the imbalance (See Eq. (3.17)) and also improves variability due to the increase of the area of the transistor.

The results shown in this section suggest that while working with this technology the wide back plane voltage range can be used effectively to minimize energy and improve variability without jeopardizing performance. Furthermore, new sizing approaches must be analysed to fully benefit from this impressive knob available in the UTBB FD-SOI technology since the equalization of strengths can be achieved by other means besides sizing.

### 4.3 Asymmetric Length Biasing (ALB)

Length biasing was proposed as a knob to mitigate several disadvantages of advanced nanometers technologies, specially for subthreshold digital circuits $[33,36]$. In [33], the authors show how an increase in the length of the devices can improve the subthreshold slope and the DIBL coefficient. Fig. 4.9 shows the subthreshold slope factor $n$ as a function of the length for the NMOS LVT device of the technology used.

Additionally, in [33] it is shown that in the subthreshold regime and for advanced technologies the output capacitance of the digital gates is mostly dominated by fringing capacitances which are almost independent on the transistors length. This is why a moderate increase in the length of the devices has little impact on the dynamic energy consumption. Finally, the impact in the overall cell area of a moderate increase in the length of the devices is very small due to the almost negligible contribution of the active area to the total area.

We proposed to use an Asymmetric Length Biasing (ALB) to achieve the optimum imbalance for leakage currents that minimizes the leakage energy, as showed in the previous chapter. To do this, the length of the NMOS should be increased to change the current of this device and adjust the imbalance to the optimum.

Chapter 4. Subthreshold energy reduction techniques


Figure 4.9: NMOS LVT subthreshold slope factor as a function of transistors length.

### 4.3.1 Simulation results

To verify the impact of the ALB approach, the considered test circuit was also a chain of inverters, but the activity factor was changed to 0.1 (i.e ten 25 inverter chain, one switching with just in time operation frequency and the other nine with a fixed input)(Fig. 4.10). If it is not specified otherwise, the devices used were LVT.


Figure 4.10: Chain of inverters with an activity factor of 0.1 .

### 4.3. Asymmetric Length Biasing (ALB)

## Optimum leakage current imbalance with ALB

Fig. 4.11 shows the simulation results for the energy per operation (black solid contour lines) of the test circuit as a function of $V_{d d}$ and the length of the NMOS devices ( $L n$ ). In this case, the length of the PMOS was maintained at the minimum size, leading the leakage current of the PMOS devices to be fixed. Then, by changing $L_{n}$ we vary the leakage current of the NMOS device and we can adjust the leakage currents imbalance to the optimum shown in Eq. (3.18).


Figure 4.11: Asymmetric Length Biasing (ALB). Energy per operation (solid black lines) normalized to the minimum energy and leakage current imbalance ( $I_{L, n} / I_{L, p}$ ) (dashed grey lines) as a function of $V_{d d}$ and $L n$ (length of NMOS). LVT devices. $L p=30 \mathrm{~nm}$. $W p=W n=80 \mathrm{~nm} . V B P n=V B P p=0 V$

The energy contours shown in Fig. 4.11 are normalized to the minimum energy achieved. This minimum energy is achieved at a $V_{d d}=250 \mathrm{mV}$ and a $L n=50 \mathrm{~nm}$. In the classic symmetric case, with $L n=L p=30 \mathrm{~nm}$, the minimum energy per operation is achieved at a $V_{d d}=325 \mathrm{mV}$ and it consumes $50 \%$ more energy than the optimum asymmetric case as it is shown in Fig. 4.11.

Additionally, Fig. 4.11 also shows the leakage imbalance between the NMOS and PMOS (dashed grey contour lines) as a function of $V_{d d}$ and the length of the NMOS devices $(L n)$, where we can see that the optimum imbalance is between 0.4 and 0.5 . In this simple test circuit, it is easy to see that $W_{\text {effp }} \approx W_{\text {effn }}$ and $\tau_{p} \approx \tau_{n}$. However, since we have an upsized $L_{n}$, the subthreshold slope of the NMOS is different from the PMOS, which makes the optimum imbalance move from 1. From simulations, we obtained a $n_{n}=1.23$ and $n_{p}=1.46$ for $L_{n}=50 \mathrm{~nm}$ and $L_{p}=30 \mathrm{~nm}$. Then, using Eq. (3.19), and a $V_{d d}=275 \mathrm{mV}$, the optimum predicted imbalance is 0.47 .

## Chapter 4. Subthreshold energy reduction techniques

Another approach would be to use the width of the PMOS device to adjust the imbalance between the leakage currents of the two devices. This is the classic approach while designing above the threshold standard cells to equalize rise and fall times. However, the increase in the width of the devices has a high impact on the output capacitance of the gates.

Fig. 4.12 shows the same energy per operation and leakage imbalance contours as Fig. 4.11 but as a function of the width of the PMOS device ( $W p$ ). The energy is normalized to the energy obtained with the optimum ALB from Fig. 4.11 ( $V_{d d}=250 m V$ and a $L n=50 \mathrm{~nm}$ ). We can see that although the imbalance moves towards the optimum, the energy per operation does not decrease. This is because the dynamic energy increases due to the increase in the output capacitance of the gates. This is why the classic sizing approach for above threshold digital gates is not appropriate for subthreshold design.

The energy reduction obtained with ALB is due to three factors. First, as it was shown with the model presented in Section 3.2.2, we are working in the optimum imbalance between the leakage currents and thus reducing the leakage energy. Additionally, increasing the length of the NMOS devices improves the subthreshold slope which also reduces the leakage energy (See Eq. (3.13)). Finally, these two factors reduce the optimum $V_{d d}$ (See Eq. (3.16)) and in consequence this reduces the dynamic energy consumed.


Figure 4.12: Energy per operation (solid black lines) normalized to the minimum energy achieved with ALB and leakage current imbalance ( $I_{L, n} / I_{L, p}$ ) (dashed grey lines) as a function of $V_{d d}$ and $W p$ (width of PMOS). LVT devices. $L p=L n=30 \mathrm{~nm} . W n=80 \mathrm{~nm}$. Test circuit 1.

### 4.3. Asymmetric Length Biasing (ALB)

## Asymmetric length biasing with back plane biasing

The comparison made between ALB and the classic minimum sizing is not a totally fair comparison since the performance achieved in each case is different. To evaluate the energy reductions of ALB at the same performance, we used the back plane biasing opportunities available in this technology for threshold voltage tuning. For example, in LVT devices the back plane voltage can be used to reduce the $V_{T}$ of the devices in a wide range. Due to the doping type of the back plane, these voltages ( $V_{B P n}$ and $V_{B P p}$ ) are usually used considering $V_{B P n}=-V_{B P p}=V B$.

Fig. 4.13 shows the energy per operation contour (solid black lines) as a function of $V_{d d}$ and the back plane voltage $(V B)$ for the two cases, with the minimum size devices ( $L n=L p=30 \mathrm{~nm}$ ) and with the ALB ( $L n=50 \mathrm{~nm}$ and $L p=30 \mathrm{~nm}$ ). The energy per operation is normalized to the minimum energy achieved with $L n=50 \mathrm{~nm}$ and $L p=30 \mathrm{~nm}$. We also show performance contours (dashed grey lines) to compare the energy reductions obtained with ALB at the same performance. These values are in MHz.

From Fig. 4.13 we can see the energy reduction obtained with ALB jointly with the expected performance. For example, for low performances ( 1 MHz ), the energy obtained with symmetric lengths can be up to 4 times higher than with an ALB. As performance requirements increase, the energy benefits obtained by ALB decrease. Nevertheless, for example, at 10 MHz and 100 MHz the energy consumed with symmetric lengths is $70 \%$ and $35 \%$ higher than in the ALB case. The energy savings depend on the ratio between the leakage energy and the total energy. For very low performance the total energy is almost all due to leakage energy while for high performance the total energy is almost all dynamic. Since the ALB reduces the leakage energy, the benefits are higher when targeting lower performances. This shows that there is a very wide range of applications that can benefit from ALB.

Finally, we can implement the ALB, but with an upsized length for the PMOS. This improves the subthreshold slope factor reducing, even more, the energy per operation consumption. Figure 4.14b shows the energy per operation and performance contours as presented before but for $L n=110 \mathrm{~nm}$ and $L p=40 \mathrm{~nm}$. In this case, the total energy consumption can be further reduced in a wide range of target performances but the increase in dynamic energy makes the circuit consume more total energy in high-performance scenarios. Additionally, Fig. 4.14a shows the energy per operation and performance contours as presented before but for symmetric length biasing $L n=50 \mathrm{~nm}$ and $L p=50 \mathrm{~nm}$. In these last two cases, the energy contours are normalized to the minimum energy achieved in Fig. 4.11. It can be noted that if a symmetric upsized approach is used the energy consumption is still higher than the ALB case with minimum $L p$ because the circuit is not working in the optimum imbalance.

In summary, we showed that by using ALB, significant energy reductions can be achieved while maintaining the same performance.

Chapter 4. Subthreshold energy reduction techniques


Figure 4.13: Symmetrical minimum sizing vs ALB. Energy per operation (solid black lines) normalized to the minimum energy achieved with ALB (Fig. 4.11) and performance contours (dashed grey lines) (in MHz ) as a function of supply voltage and back plane voltage. LVT devices. $V_{B P n}=-V_{B P p}=V B$. a) $L n=L p=30 \mathrm{~nm}$ b) $L n=50 \mathrm{~nm} L p=30 \mathrm{~nm}$.


Figure 4.14: Symmetrical upsizing vs ALB Upsizing. Energy per operation (solid black lines) normalized the minimum energy achieved with ALB (Fig. 4.11) and performance contours (dashed grey lines) (in MHz ) as a function of supply voltage and back plane voltage. LVT devices. $V_{B P n}=-V_{B P p}=V B$. a) $L n=L p=50 \mathrm{~nm}$ b) $L n=110 \mathrm{~nm} L p=40 \mathrm{~nm}$

## Variability in asymmetric length biasing

Another important concern while building subthreshold digital circuits with advanced nanometers technologies is the variability. This is because the drain current of the transistors depends exponentially on the threshold voltage, supply voltage and temperature. To assess the impact of ALB in the variability of the circuit, we performed 1000 Monte Carlo simulations of mismatch and process for the energy

### 4.3. Asymmetric Length Biasing (ALB)

per operation with each of the sizing approaches discussed in the last section.
Table 4.1 shows the Monte Carlo results for each sizing approach. It points out the supply voltage, the mean energy per operation and the standard deviation obtained from the simulation. For each sizing approach, the simulation results are shown with the optimum $V_{d d}$. Additionally, in some cases, an increased $V_{d d}$ is also shown. We can see that ALB significantly improves the variability, particularly when the optimum $V_{d d}$ considering variability [32] is used, i.e. case 6 and 7 compared to case 1 and 3 of Table 4.1.

Table 4.1: Monte Carlo Simulation Results

| Size | $V_{d d}(\mathrm{~V})$ | Mean Ene/Op (aJ) | Std Dev (aJ) | Std Dev/Mean |
| :---: | :---: | :---: | :---: | :---: |
| $\begin{aligned} & \hline L n=30 \mathrm{~nm} \\ & L p=30 \mathrm{~nm} \\ & \hline \end{aligned}$ | 0.325 | 629 | 77.2 | 0.12 |
| $\begin{aligned} & L n=50 \mathrm{~nm} \\ & L p=30 \mathrm{~nm} \end{aligned}$ | 0.250 | 449 | 54.0 | 0.12 |
| $\begin{aligned} & L n=50 \mathrm{~nm} \\ & L p=50 \mathrm{~nm} \end{aligned}$ | 0.300 | 494 | 73.6 | 0.15 |
| $\begin{aligned} & L n=110 \mathrm{~nm} \\ & L p=40 \mathrm{~nm} \end{aligned}$ | 0.225 | 343 | 31.7 | 0.092 |
| $\begin{aligned} & L n=30 \mathrm{~nm} \\ & L p=30 \mathrm{~nm} \end{aligned}$ | 0.300 | 689 | 116.6 | 0.17 |
| $\begin{aligned} & L n=50 \mathrm{~nm} \\ & L p=30 \mathrm{~nm} \\ & \hline \end{aligned}$ | 0.300 | 419 | 28.2 | 0.067 |
| $\begin{aligned} & L n=110 \mathrm{~nm} \\ & L p=40 \mathrm{~nm} \end{aligned}$ | 0.300 | 355 | 13.2 | 0.037 |

${ }^{1}$ Symmetric Length @ $V_{d d}$ of MEP
${ }^{2}$ Asymmetric Length Biasing @ $V_{d d}$ of MEP
${ }^{3}$ Upsized Symmetric Length Biasing @ $V_{d d}$ of MEP
${ }^{4}$ Upsized Asymmetric Length Biasing @ $V_{d d}$ of MEP
${ }^{5}$ Symmetric Length Biasing @ $V_{d d_{3}}$
${ }^{6}$ Asymmetric Length Biasing @ $V_{d d_{3}}$
${ }^{7}$ Upsized Asymmetric Length Biasing @ $V_{d d_{3}}$

This is due to two reasons. The first one is that the upsized length reduces the variability of the devices since the active area is bigger. This is achieved with very little total area penalty. The second reason is that because with ALB we are working at optimum imbalance that minimizes the energy, small variations in the imbalance have little impact on the energy consumption. This is related to the fact that we are considering variations around a local minimum which in Fig. 4.14 b can be seen that it has a wide flat area around it.

## Chapter 4. Subthreshold energy reduction techniques

## Temperature dependence in asymmetric length biasing

Since we are working in the subthreshold regime, the temperature dependence is stronger due to the exponential dependence on the drain current of the devices. However, here we show that the optimum length is not strongly affected by temperature. The optimum length directly depends on the optimum imbalance which depends with temperature through the thermal voltage, as shown in Eq. (3.19). From Eq. (3.19) we can note that how much does the temperature impacts in the optimum imbalance depends on the difference between the subthreshold slopes of the NMOS and PMOS devices and the supply voltage. To assess this impact, we did the same simulation as in Fig. 4.11 for different temperatures, the results are shown in Fig. 4.15.

We can see that, even in the extreme cases of $125^{\circ} \mathrm{C}$ and $-40^{\circ} \mathrm{C}$, the optimum length is not significantly changed. If we select the optimum for room temperature ( $L_{n}=50 \mathrm{~nm}$ ) the savings obtained with ALB are almost the same for all the industrial temperature range.


Figure 4.15: Energy contours (aJ) as a function of $V_{d d}$ and the length of the NMOS ( $L n$ ) for different temperatures. $V_{B P p}=V_{B P n}=0 V$. LVT

## Asymmetric length biasing in bulk technologies

It is important to notice that this technique can be applied to conventional bulk technologies. To have a first glance of the benefits that can be achieved in these technologies, we simulated the same test circuit with Predictive Technology Models [80] from 90 nm down to 32 nm node.

Table 4.2 shows the simulation results for each node. These include the $L n_{\text {min }}$, the $L n_{o p}$ (that minimizes the energy per operation), the energy per operation in each case and finally the relationship between these two energies. The $L p$ was maintained at the minimum.

Table 4.2: ALB in bulk process. PTM Simulation Results

| Node | $L n_{\text {min }}$ <br> $(\mathrm{nm})$ | $L n_{o p}$ <br> $(\mathrm{~nm})$ | $W n=W p$ <br> $(\mathrm{~nm})$ | Ene@Ln <br> $(\mathrm{fJ})$ | Ene@ $L n_{\text {op }}$ <br> $(\mathrm{fJ})$ | $\frac{\text { Ene@Ln} \mathrm{min}^{\text {Ene@Lnopt }}}{}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 90 nm | 90 | 120 | 135 | 1.139 | 1.039 | 1.096 |
| 65 nm | 65 | 85 | 100 | 0.8852 | 0.7571 | 1.169 |
| 45 nm | 45 | 61 | 67 | 0.6347 | 0.5201 | 1.220 |
| 32 nm | 32 | 42 | 48 | 0.4350 | 0.3816 | 1.328 |

We can see that as we move to a more advanced technology node, the savings obtained with ALB increase. This is in agreement with the discussion given for the dominant output gate capacitances. As the technology node decreases, the fringing capacitances are more dominant in the output capacitances of the digital gates and thus the increase in the length to achieve the optimum imbalance has less impact in dynamic energy.

### 4.4 Comparison OBB and ALB

In this section, we compare the two techniques proposed in this thesis. To do that we selected as test circuit the chain of inverters with the activity factor of 0.1 (i.e ten 25 inverter chain, one switching with just in time operation frequency and the other nine with a fixed input)(Fig. 4.10). The devices used for the simulations were LVT.

Fig. 4.16 shows the simulation results where it can be seen the energy per operation consumption for different schemes. For each frequency the optimum $V_{d d}$ was selected. The simulation was restricted to a $V_{d d}$ between 150 mV and 500 mV . Also, different back plane biasing schemes were used.

Chapter 4. Subthreshold energy reduction techniques


Figure 4.16: Energy vs frequency for the different techniques proposed. In each case, the optimum $V_{d d}$ was selected.

The figure shows the Symmetric Back Plane Biasing (SBB) (black), the Optimum Back Plane Biasing (OBB) (red), the Asymmetric Length Biasing (ALB) (blue) and an hybrid approach between OBB and ALB (green). Additionally, each scheme is also presented in an upsized version, the results are presented with the same color but dashed. The width of all the devices was maintained to the minimum. With exception of SBB, all the other approaches work in the optimum leakage imbalance proposed in this work.

All the energy profiles show three different regions of operation. For low frequency, the energy per operation starts increasing since the leakage energy is dominant. The second region corresponds with the lower energy per operation of the profile and indicates that the circuit works near the minimum energy point. In the third region, to achieve very high frequency the $V_{d d}$ needs to be increased to reduce the delay since back plane biasing is not enough to adjust the frequency. In this region, the dynamic energy is dominant.

In the first region, for low frequencies, for example, the SBB upsized can consume up two 2.9 times more energy than ALB upsized. However, this is not a fair comparison since if this frequency is the target one, the SBB upsized, would be designed to work at a higher frequency and thus consume less energy and then a
power gating technique would be applied. Nevertheless, since ALB upsized can achieve this frequency with near minimum energy per operation, just in time operation can be used avoiding the complexity and energy overhead due to power gating.

In the second region, we can see that the proposed techniques achieve less energy consumption than the classics approach. This is because they are working under the optimum imbalance and thus minimizing energy. Additionally, it can be seen that they can maintain minimum energy consumption in a wider range of frequencies, in particular, the upsized versions and ALB. This is due to several reasons. First, working with the optimum imbalance reduces the derivative of the total energy with respect to $V_{d d}$ since the LF in Eq. (3.15) is minimized. This makes that changes in $V_{d d}$ to adjust the frequency have less impact in the energy consumption. Of course, this is true while working near the MEP. Increasing the length of the devices, improves the subthreshold slope and thus, from Eq. (3.15), it can be seen that has the same effect in the derivative of the energy with respect to $V_{d d}$. Moreover, ALB has a wider range than OBB. This is because in the OBB technique the wide range of back plane biasing is used to adjust the imbalance, so very little is left to adjust performance without energy penalties.

On the other hand, although ALB is the one that consumes the lower energy, it achieves the optimum imbalance through sizing. Thus, if the optimum imbalance changes from circuit to circuit, the sizing of the gates would depend on the circuit architecture. This is impractical since we want to have a standard cells library to synthesize any circuit. This is why, an hybrid approach between both techniques, ALB and OBB, seems to be the best option. This means adjusting the optimum imbalance partially with ALB and partially with OBB. In this way, the hybrid approach can obtain the benefits of both techniques, the dynamics of OBB and the wider range of ALB.

In this particular circuit, OBB with minimum size is not that beneficial. This is due to two reasons. First, since is not upsizing any device, there is no improvement in the subthreshold slope making the optimum $V_{d d}$ to be higher and the derivative of the energy with respect to $V_{d d}$ to be higher. And second, the low activity factor makes the optimum $V_{d d}$ to be higher which makes the minimum energy point to be less in the subthreshold regime. In consequence, working in the optimum imbalance does not achieve the energy reduction that the other approaches achieve. However, this might not be the case for other activity factors.

Finally, in the third region, the upsized versions start to consume more energy. This is because dynamic energy increases due to the increase of the $V_{d d}$ and this makes the output capacitance of the gates to increase due to the upsized length of the devices. Still, working in the optimum imbalance reduces the part of leakage energy until this is negligible with respect to dynamic energy. When this happens, at very high frequency, the best approach is the classic minimum size SBB.

In conclusion, we showed that important energy reductions can be achieved by

## Chapter 4. Subthreshold energy reduction techniques

using the proposed techniques. There is a very wide range of frequency where the energy can almost be halved.

### 4.5 Ripple Carry Adder (RCA) simulations

In order to confirm the benefits of the energy reduction techniques presented we simulated an 8-bit Ripple Carry Adder (RCA). The architecture selected for the full adder is shown in Fig. 4.17.


Figure 4.17: Architecture of 1-bit section of the full adder.
In Section 3.2.2 we presented a model for the energy per operation consumed by digital circuits working near the MEP and we saw that the optimum imbalance between the leakage currents of the NMOS and PMOS that minimizes the energy consumption, depends on the architecture and the inputs of the circuit. In particular, we want to evaluate how much does the optimum imbalance change for different inputs in a RCA. More important, we want to see if the energy benefits obtained by the two techniques are achieved with different inputs.

From Eq. (3.19), we can see that the only parameters that depend on inputs are $W_{\text {effp }}$ and $W_{\text {effn }}$. The parameters $\tau_{n}$ and $\tau_{p}$ depend on the critical path and thus they do not change with the inputs. $W_{\text {effp }}$ and $W_{\text {effn }}$ depend on the static probability of each node of the circuit and the gate that is driving the node. However, by means of a logic simulation, the term $\sqrt{W_{e f f p} / W_{e f f n}}$ from Eq. (3.19) can be calculated for each possible input. The results are shown in Fig. 4.18.

To assess how much the optimum ALB changes with the input, we selected the three pairs of inputs marked in Fig. 4.18 and simulated the energy contours. We selected one input with the maximum $\sqrt{W_{\text {effp }} / W_{\text {effn }}}\left(W_{\text {effp }}>W_{\text {effn }}\right)$ and another with the minimum ( $W_{\text {effp }}<W_{e f f n}$ ) to get worse cases, and the last one with $W_{e f f p} \approx W_{e f f n}$.

We assessed the impact in both techniques OBB and ALB. The simulations performed are similar to the ones presented in Fig. 4.5 and Fig. 4.11. In the case


Figure 4.18: Relationship between the number of NMOS and PMOS that are imposing the leaking vs the different possible inputs. RCA
of OBB, the energy contours are presented as a function of $V_{d d}$ and $V B P_{p}$, while maintaining $V B P_{p}=0 V$. Additionally, the widths were kept to the minimum size $(W n=W n=80 \mathrm{~nm})$ for all the gates of the adder and the lengths were upsized to 50 nm . In the case of ALB, the energy contours are presented as a function of $V_{d d}$ and the length of the NMOS device ( $L n$ ), while maintaining $V B P_{p}=V B P_{p}=0 V$. The widths of the transistors of all gates were also kept to the minimum size and the minimum length ( 30 nm ) was used for the PMOS devices. In all the cases, the maximum frequency allowed by the critical path was used.

The results of the simulations are shown in Fig. 4.19 and Fig. 4.20 for OBB and ALB respectively. From simulations, we can see that the optimum imbalance changes with the different inputs. However, in both techniques, we can select a design point where the energy consumption is near the minimum for all the possible inputs.

Additionally, depending on the input, the activity factor will change and thus the optimum supply voltage that minimizes energy. From Eq. 3.19 we can see that the supply voltage also affects the optimum imbalance. In consequence, to evaluate this, we simulated three inputs with different activity factors to see how much did the optimum imbalance change. In the first one, the activity factor is very low (LAF), only one of the seven outputs $\left(S_{i}\right)$ is switching. The second one, with an intermediate activity factor (MAF), the carry is propagated to the output but each of the $S_{i}$ outputs is not switching. Finally, the third input with a high activity factor (HAF) in which all $S_{i}$ are switching and the carry is propagated to

Chapter 4. Subthreshold energy reduction techniques


Figure 4.19: Energy contours (aJ) as a function of $V_{d d}$ and $V B P_{p}$ for three different inputs. $V B P_{n}=0 V$. LVT devices.


Figure 4.20: Energy contours (aJ) as a function of $V_{d d}$ and the length of the NMOS (Ln) for three different inputs. $V B P_{n}=V B P_{p}=0 V$. LVT devices.
the output. The results of the simulations are shown in Fig. 4.21 and Fig. 4.22 for OBB and ALB respectively.

For example, in ALB, from the simulations, we can see that the optimum asymmetric length biasing is $L n=50 \mathrm{~nm}$ in the case of HAF and MAF and $L n=45 \mathrm{~nm}$ in the case of LAF. However, the energy obtained with either of the two lengths is much less than the energy obtained with a symmetric length ( $L n=30 \mathrm{~nm}$ ). So by selecting either of them the energy reduction in comparison with symmetric length is almost the same.

The exact energy reductions depend on the $V_{d d}$ selected for the circuit. For example, if we select a $V_{d d}=200 \mathrm{mV}$ to minimize the energy of the HAF input which is the one that consumes the most (due to high dynamic energy), the energy reductions for the other inputs are very high (more than $60 \%$ energy reductions). This is because in the MAF and LAF cases with a $V_{d d}=200 \mathrm{mV}$ the energy consumption is almost all due to leakage energy which is minimized by asymmetric length biasing.

On the other hand, if we select a $V_{d d}=350 \mathrm{mV}$ to minimize the LAF case, the benefits obtained by asymmetric length biasing in the case of the other inputs

### 4.6. Conclusions

are very small because, in these cases and for that $V_{d d}$, the energy consumption is almost entirely dynamic. However, this is not a good decision since the energy consumed by the HAF input, which is the one that consumes the most, is much higher since it is outside its optimum $V_{d d}$. Similar conclusions can be obtained in the case of in OBB.


Figure 4.21: Energy contours (aJ) as a function of $V_{d d}$ and $V B P_{p}$ for three different inputs. $V B P_{n}=0 V$. LVT devices.


Figure 4.22: Energy contours ( aJ ) as a function of $V_{d d}$ and the length of the $\operatorname{NMOS}(L n)$ for three different inputs. $V B P_{n}=V B P_{p}=0 V$. LVT devices.

To summarize, we saw that there is an impact of the input data in the optimum imbalance but there is still a design point which can obtain very good energy reductions for all the inputs. However, it might exist a circuit where the optimum imbalance could change dramatically with the input data. If this was the case, OBB can be adjusted dynamically for the different inputs. A decoder could be designed to select from different back plane biasing schemes.

### 4.6 Conclusions

In this chapter we showed that by using the proposed energy reduction techniques, the energy per operation consumed by sub/near threshold digital circuits can be

## Chapter 4. Subthreshold energy reduction techniques

reduced by $50 \%$ in a wide range of target performances. We presented simulation results for two circuits, a chain of inverters and a Ripple Carry Adder (RCA) implemented in a FD-SOI 28 nm technology.

The first technique consists in using the wide range of back plane voltage to adjust the $V_{T}$ of the devices and thus the leakage current to make the circuits work in the optimum imbalance presented in Section 3.2.2 which reduces the leakage energy consumption. The second technique consists of using the length of the devices to adjust the imbalance between the leakage currents. This can be done with no penalty in the output capacitances of the gates since in advanced nanometer technologies and in the subthreshold regime, the dominant capacitances are fringing. This has the additional benefit of improving the devices parameters (such as subthreshold slope and DIBL coefficient) which reduce even more the energy consumption.

We addressed the impact of ALB in the variability of the energy consumption of the circuit. We saw that ALB almost halved the variability of the devices since they are bigger allowing to build more robust and energy efficient digital circuits. Additionally, since with ALB the circuit is working in the optimum imbalance for the leakage current of the PMOS and NMOS devices, process variations have less impact in the energy consumption.

We also showed how to use both proposed techniques combined to obtain the advantages of both techniques and overcome the disadvantages.

Finally, we studied the dependence of the optimum imbalance on the inputs. To do so, we simulated an 8-Bit Ripple Carry Adder (RCA). We saw that for different inputs, with different static probabilities and with different activity factors, a fixed design point can achieve very good energy reductions in spite of the small changes in the optimum imbalance.

## Chapter 5

## Conclusions

In any Ultra Low Power (ULP) system a key circuit is the dc-dc converter. Due to size constraints, SC dc-dc converters have prevailed as the best option. In the past decade, a lot of research has been made to improve the efficiency of these converters. One of the main losses of these converters is due to the driving of the switches that implement the phases of the converter, however, there are very few works that try to reduce them. In this thesis, we made an in-depth analysis of a charge recycling technique called stepwise charging (Section 2.2 and Section 2.3). This technique was proposed to be used in applications where a large capacitance needs to be charge and discharge periodically. We analyzed the limits of this technique when trying to implement it in a fully integrated manner and with small capacitance, for example, considering aspects such as the consumption due to additional logic blocks needed to implement the technique. We proposed to use this technique in SC dc-dc converters and demonstrated through measurements that more than $30 \%$ power reduction can be obtained in the gate drive of the switches in the converter. We also proposed a special charge recycling technique which is suitable for certain architectures of SC dc-dc converters also to reduce the gate drive losses. It can be applied to converters with PMOS and NMOS switches operating in the same phase of the converter. This technique is much simpler than stepwise and can achieve similar power reductions. We fabricated and measured a SC dc-dc converters with this technique and also obtain remarkable power reductions in the gate drive of the switches (Section 2.4 and Section 2.5).

Regarding the modelling of digital circuits operating below or near the threshold voltage, one concern of the researchers was the impact of intrinsic noise in the minimum operating voltage as technology scaled. Several works predict boundaries for scaling due to intrinsic noise. In this thesis, we analyzed the impact of intrinsic noise in a more exact way and established a minimum operating voltage due to this. We compared this minimum with other definitions of minimum supply voltage and with the MEP. Additionally, we studied how this comparison changed as technology scaled. We saw that although intrinsic noise is going to be a problem

## Chapter 5. Conclusions

eventually, until the 28 nm node is still less critic than other low-voltage issues in digital circuits such as variability (Section 3.1). We also proposed an enhanced model for the MEP which considers the differences between the NMOS and PMOS devices. By doing this, we could show that the energy obtained in the MEP, depends on the leakage current imbalance between the NMOS and PMOS devices. We could demonstrate, that an optimum imbalance exists and that it depends on the architecture of the circuit (Section 3.2).

Finally, we developed two techniques to operate in the mentioned optimum imbalance. The first one, consists of a novel strategy for the back plane biasing. Instead of a symmetric implementation, where both back planes voltages, PMOS and NMOS, are changed in the same amount to change the circuit performance, part of the voltage is used to modify the leakage imbalance and make the circuit work in the optimum. By doing this, we could obtain up to around $50 \%$ energy reductions in the digital circuit consumption (Section 4.2). The second technique consists of using the length of the devices to adjust the current imbalance between the NMOS and PMOS devices. This has little impact in the output capacitance of the gates since the fringing capacitances, that dominate in advanced nanometer technologies in subthreshold operation, do not depend on the length of the devices. With this technique, also around $50 \%$ energy reductions can be obtained with no performance penalties (Section 4.3).

This thesis analyzed key circuits in Ultra Low Power (ULP) systems proposing novel techniques to obtain energy efficient structures and implementations of circuits and systems for these kind of applications. We made contributions in the design of SC dc-dc converters (Chapter 2), modelling of subthreshold digital circuits (Chapter 3) and design techniques for energy consumption reductions in digital circuits working near the MEP (Chapter 4).

### 5.1 Thesis contributions

The main contributions of this thesis are:

- Analysis of the stepwise charging technique.
- Implementation of stepwise charging technique as a charge recycling technique for efficiency improvement in switched capacitor dc-dc converters
- Development of a charge sharing technique for efficiency improvement in switched capacitor dc-dc converters
- Analysis of minimum operating voltage of digital circuits due to intrinsic noise and the impact of technology scaling in this minimum
- Improvement in the modelling of the minimum energy point while considering NMOS and PMOS transistors difference
- Demonstration of an optimum leakage current imbalance between the NMOS and PMOS transistors that minimizes energy consumption in sub threshold
- Development of an asymmetrical back plane (bulk) voltage strategy for working in this optimum
- Development of an asymmetrical poly bias strategy for working in the aforementioned optimum
- Analysis of the impact of input data in the optimum imbalance


### 5.2 Future work

The trend towards the Internet of Things (IoT) is making Ultra Low Power (ULP) systems more and more common in the industry. This will make that ultra-low energy and power techniques such as working in the sub/near threshold regime to be more and more adopted in consumer electronic.

In the framework of this thesis, regarding the dc-dc converters and the techniques developed for efficiency improvement, an interesting research would be to evaluate the impact of using interleaved converters with the developed techniques. Lined with this, try to design state-of the art converters using the proposed techniques.

In the framework of subthreshold digital circuits, to make the energy reduction techniques easy to use for the designer several steps need to be accomplished. Firstly, more analysis in different circuits architectures should be carry out to fully confirm the impact of the architecture and input data in the optimum leakage current imbalance. To do this, we first need to develop a standard cell library with the different sizing approaches developed during the thesis. This means also characterize them for subthreshold operation. Then, through the classic digital flow test several circuits to compare the impact in the optimum leakage imbalance.

Additionally, tools for assessing the value of $W_{\text {eff }}$ and $\tau$ in different circuits need to be developed to do a similar analysis as the one presented for the RCA. Since different synthesis of the same circuit would change the optimum leakage current imbalance, an interesting thing would be to have tools that you could indicate which optimum leakage imbalance you want for the circuit and take this into account during the synthesis. If this were possible, we could have circuits where the optimum imbalance, from the energy consumption point of view, was also the optimum imbalance from the robustness point of view. For example, an optimum imbalance of 1 to maximize the noise margins of the gates.

Body bias generators were not studied during the thesis but these circuits should be addressed to develop the Optimum Back Plane Biasing (OBB) technique. Also, dynamic or adaptive body biasing could be use to overcome changes in the optimum imbalance due to different input data. A decoder could be used to select

## Chapter 5. Conclusions

from different Optimum Back Plane Biasing (OBB) considering the input data of that moment.

### 5.3 Publications associated with the thesis

### 5.3.1 Journals

- F. Veirano, L. Naviner, and F. Silveira, "Optimum nMOS/pMOS Imbalance for Energy Efficient Digital Circuits," IEEE Transactions on Circuits and Systems: TCAS-I Regular papers, vol. 64, no. 12, pp. 3081-3091, Dec 2017
- ——, "Optimal asymmetrical back plane biasing for energy efficient digital circuits in 28nm UTBB FD-SOI," Integration, the VLSI Journal, 2017
- ——, "Minimum Operating Voltage Due to Intrinsic Noise in Subthreshold Digital Logic in Nanoscale CMOS," Journal of Low Power Electronics, vol. 12, no. 1, pp. 74-81, 2016
- F. Veirano, P. C. Lisboa, P. Pérez-Nicoli, L. Naviner, and F. Silveira, "A Charge Recycling Technique for Efficiency Improvement in Switched Capacitor dc-dc Converters," Microelectronic Journal (Under review), vol. XX, no. X, XXXX


### 5.3.2 Conferences

- ——, "Pushing Minimum Energy Limits by Optimal Asymmetrical Back Plane Biasing in 28 nm UTBB FD-SOI," in International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Sept 2016, pp. 243-249
- F. Veirano, F. Silveira, and L. Naviner, "Asymmetrical Length Biasing for Energy Efficient Digital Circuits," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), Feb 2017
- ——, "Is intrinsic noise a limiting factor for subthreshold digital logic in nanoscale CMOS?" in 2015 International Workshop on CMOS Variability (VARI), Sept 2015, pp. 45-50
- F. Veirano, P. Pérez-Nicoli, P. Castro-Lisboa, and F. Silveira, "Gate drive losses reduction in switched-capacitor DC-DC converters," in 2018 IEEE 9th Latin American Symposium on Circuits Systems (LASCAS), Feb 2018, pp. 1-4


### 5.4. Publications non associated with the thesis

### 5.4 Publications non associated with the thesis

### 5.4.1 Journals

- P. C. Lisboa, P. Pérez-Nicoli, F. Veirano, and F. Silveira, "General Top/BottomPlate Charge Recycling Technique for Integrated Switched Capacitor DCDC Converters," IEEE Transactions on Circuits and Systems: TCAS-I Regular papers, vol. 63, no. 4, pp. 470-481, April 2016
- P. Pérez-Nicoli, P. C. Lisboa, F. Veirano, and F. Silveira, "A Series-Parallel Switched Capacitor Step-up DC-DC Converter and its Gate-Control Circuits for Over the Supply Rail Switches," Analog Integrated Circuits and Signal Processing, vol. 85, no. 1, pp. 37-45, 2015
- P. Pérez-Nicoli, F. Veirano, P. C. Lisboa, and F. Silveira, "Low-power operational transconductance amplifier with slew-rate enhancement based on non-linear current mirror," Analog Integrated Circuits and Signal Processing, vol. 89, no. 3, pp. 521-529, 2016
- J. P. Oliver, F. Veirano, D. Bouvier, and E. Boemo, "A Low Cost System for Self Measurements of Power Consumption in Field Programmable Gate Arrays," Journal of Low Power Electronics, vol. 13, no. 1, pp. 1-9, 2017


### 5.4.2 Conferences

- F. Veirano, P. Perez, S. Besio, P. Castro, and F. Silveira, "Ultra low power pulse generator based on a ring oscillator with direct path current avoidance," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), 2013, pp. 1-4
- P. Pérez-Nicoli, F. Veirano, P. C. Lisboa, and F. Silveira, "High Slew-Rate OTA with Low Quiescent Current Based on Non-Linear Current Mirror," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), Feb 2015, pp. 1-4
- P. Pérez-Nicoli, F. Veirano, and F. Silveira, "Comparator with self controlled delay for active rectifiers in inductive powering," in IEEE Wireless Power Transfer Conference (WPTC). IEEE, 2018, pp. 1-3
- P. Pérez-Nicoli, F. Veirano, C. Rossi-Aicardi, and P. Aguirre, "Design method for an ultra low power, low offset, symmetric OTA," in 2013 7th Argentine School of Micro-Nanoelectronics, Technology and Applications, Aug 2013, pp. 38-43

This page was intentionally left blank.

## Appendix A

## Convergences of auxiliary capacitors voltage in stepwise charging

In this appendix we will study the convergence of the auxiliary capacitors that implement the stepwise charging technique. We will see that the voltage in this capacitors converges to the value expressed in Eq. 2.3. We will address the case of two auxiliary capacitors but it can be easily generalized to any number of auxiliary capacitors.

## A. 1 Charge transfer between capacitors

First, we analyze the charge transfer between two capacitors $C_{1}$ and $C_{2}$ with initial charge $Q_{1_{i}}$ and $Q_{2_{i}}$ (see Fig. A.1).


Figure A.1: Charge transfer between capacitors.
We will refer to $k$ to the ratio between the two capacitors $\left(\frac{C_{1}}{C_{2}}=k\right) . V_{1_{i}}$ and $V_{2_{i}}$ are the initial voltages in each capacitors and they can be calculated as

$$
\begin{equation*}
V_{1_{i}}=\frac{Q_{1_{i}}}{C_{1}} y V_{2_{i}}=\frac{Q_{2_{i}}}{C_{2}} . \tag{A.1}
\end{equation*}
$$

## Appendix A. Convergences of auxiliary capacitors voltage in stepwise charging

The charge transfer will finish when both capacitances achieve the same voltage and thus, the final charge in each capacitor will be related through

$$
\begin{equation*}
V_{1_{f}}=V_{2_{f}} \Longrightarrow \frac{Q_{1_{f}}}{C_{1}}=\frac{Q_{2_{f}}}{C_{2}} \Longrightarrow Q_{1_{f}}=Q_{2_{f}} . k \tag{A.2}
\end{equation*}
$$

Additionally, using the charge conservation principle we can obtain the relationship between the initial voltages and the final voltage in each capacitor

$$
\begin{equation*}
V_{1_{f}}=V_{2_{f}}=\frac{V_{1_{i}} \cdot k+V_{2_{i}}}{k+1} . \tag{A.3}
\end{equation*}
$$

## A. 2 Convergence in two auxiliary capacitors

In this section we consider the case of stepwise charging for two capacitors, see Fig. A.2. Each time SwB or SwC is closed and the rest open, a CRC circuit as the one analyzed in previous section is formed. Additionally, we consider the auxiliary capacitor to be $k$ times bigger than the main capacitance to be charged and discharged.


Figure A.2: Convergence between 2 capacitors.

We will call a cycle when the capacitance $C$ is fully discharged and charged again. From Fig. A.2, we see that a cycle corresponds to close the switches

## A.2. Convergence in two auxiliary capacitors

sequentially in the next order; SwA (while the rest are open), $\mathrm{SwB}, \mathrm{SwC}, \mathrm{SwD}$, SwC y SwB.

Now, we will see how the voltage in each capacitor changes after one cycle considering they have an initial voltage $V_{1_{i}}$ and $V_{2_{i}}$. The switches are designed so that the time while they are closed is enough so that the initial and final voltages in the capacitors are related through Eq. (A.3).

## SwA

$$
V_{1}=V_{1_{i}}, \quad V_{2}=V_{2_{i}}, \quad V_{C}=V_{d d}
$$

SwB

$$
V_{1}=V_{1_{i}}, \quad V_{2}=\frac{k \cdot V_{2_{i}}+V_{d d}}{k+1}, \quad V_{C}=V_{2}
$$

SwC

$$
V_{1}=\frac{k \cdot V_{1_{i}}}{k+1}+\frac{k \cdot V_{2_{i}}+V_{d d}}{(k+1)^{2}}, \quad V_{2}=\frac{k \cdot V_{2_{i}}+V_{d d}}{k+1}, \quad V_{C}=V_{1}
$$

SwD

$$
V_{1}=\frac{k \cdot V_{1_{i}}}{k+1}+\frac{k \cdot V_{2_{i}}+V_{d d}}{(k+1)^{2}}, \quad V_{2}=\frac{k \cdot V_{2_{i}}+V_{d d}}{k+1}, \quad V_{C}=0
$$

SwC

$$
V_{1}=\frac{k^{2} \cdot V_{1_{i}}}{(k+1)^{2}}+\frac{k \cdot\left(k \cdot V_{2_{i}}+V_{d d}\right)}{(k+1)^{3}}, \quad V_{2}=\frac{k \cdot V_{2_{i}}+V_{d d}}{k+1}, \quad V_{C}=V_{1}
$$

The second time that SwC is closed, the final voltage $V_{1}$ corresponds to the initial voltage of the next cycle so it can be expressed as

$$
V_{1_{i+1}}=V_{1}=\frac{k^{2} \cdot V_{1_{i}}}{(k+1)^{2}}+\frac{k \cdot\left(k \cdot V_{2_{i}}+V_{d d}\right)}{(k+1)^{3}} .
$$

SwB
$V_{1}=\frac{k^{2} \cdot V_{1_{i}}}{(k+1)^{2}}+\frac{k \cdot\left(k \cdot V_{2_{i}}+V_{d d}\right)}{(k+1)^{3}}, \quad V_{2_{i+1}}=V_{2}=\frac{k\left(k \cdot V_{2_{i}}+V_{d d}\right)}{(k+1)^{2}}+\frac{V_{1_{i+1}}}{k+1}, \quad V_{C}=V_{2}$
In conclusion, at the end of cycle $i$, the voltage in each auxiliary capacitor will be

$$
\begin{gather*}
V_{1_{i+1}}=\frac{k^{2} \cdot V_{1_{i}}}{(k+1)^{2}}+\frac{k \cdot\left(k \cdot V_{2_{i}}+V_{d d}\right)}{(k+1)^{3}},  \tag{A.4}\\
V_{2_{i+1}}=\frac{k\left(k \cdot V_{2_{i}}+V_{d d}\right)}{(k+1)^{2}}+\frac{V_{1_{i+1}}}{k+1} . \tag{A.5}
\end{gather*}
$$

Appendix A. Convergences of auxiliary capacitors voltage in stepwise charging

We want to find the convergence values of these voltages, so using Eq. (A.6) and the fact that when they converge $V_{1_{i}}=V_{1_{i+1}}$ and $V_{2_{i}}=V_{2_{i+1}}$ we need to find the solution to the next system of equations

$$
\begin{gathered}
V_{1}=\frac{k^{2} \cdot V_{1}}{(k+1)^{2}}+\frac{k \cdot\left(k \cdot V_{2}+V_{d d}\right)}{(k+1)^{3}}, \\
V_{2}=\frac{k\left(k \cdot V_{2}+V_{d d}\right)}{(k+1)^{2}}+\frac{V_{1}}{k+1} .
\end{gathered}
$$

The solution is

$$
\begin{equation*}
V_{1}=\frac{k \cdot V_{d d}}{3 \cdot k+1}, \quad V_{2}=\frac{2 . k \cdot V_{d d}}{3 \cdot k+1} . \tag{A.6}
\end{equation*}
$$

Finally, if $k \gg 1$, the convergence of the voltages in each capacitor $V_{1}$ and $V_{2}$ would be close to $V_{d d} / 3$ and $2 V_{d d} / 3$ respectively. This can be generalized to Eq. (2.3) for more than two auxiliary capacitors.

## Bibliography

[1] F. Veirano, L. Naviner, and F. Silveira, "Optimum nMOS/pMOS Imbalance for Energy Efficient Digital Circuits," IEEE Transactions on Circuits and Systems: TCAS-I Regular papers, vol. 64, no. 12, pp. 3081-3091, Dec 2017.
[2] ——, "Optimal asymmetrical back plane biasing for energy efficient digital circuits in 28nm UTBB FD-SOI," Integration, the VLSI Journal, 2017.
[3] ——, "Pushing Minimum Energy Limits by Optimal Asymmetrical Back Plane Biasing in 28nm UTBB FD-SOI," in International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Sept 2016, pp. 243-249.
[4] F. Veirano, F. Silveira, and L. Naviner, "Asymmetrical Length Biasing for Energy Efficient Digital Circuits," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), Feb 2017.
[5] ——, "Is intrinsic noise a limiting factor for subthreshold digital logic in nanoscale CMOS?" in 2015 International Workshop on CMOS Variability (VARI), Sept 2015, pp. 45-50.
[6] ——, "Minimum Operating Voltage Due to Intrinsic Noise in Subthreshold Digital Logic in Nanoscale CMOS," Journal of Low Power Electronics, vol. 12, no. 1, pp. 74-81, 2016.
[7] F. Veirano, P. Pérez-Nicoli, P. Castro-Lisboa, and F. Silveira, "Gate drive losses reduction in switched-capacitor DC-DC converters," in 2018 IEEE 9th Latin American Symposium on Circuits Systems (LASCAS), Feb 2018, pp. $1-4$.
[8] F. Veirano, P. C. Lisboa, P. Pérez-Nicoli, L. Naviner, and F. Silveira, "A Charge Recycling Technique for Efficiency Improvement in Switched Capacitor dc-dc Converters," Microelectronic Journal (Under review), vol. XX, no. X, XXXX.

## Bibliography

[9] P. C. Lisboa, P. Pérez-Nicoli, F. Veirano, and F. Silveira, "General Top/Bottom-Plate Charge Recycling Technique for Integrated Switched Capacitor DC-DC Converters," IEEE Transactions on Circuits and Systems: TCAS-I Regular papers, vol. 63, no. 4, pp. 470-481, April 2016.
[10] F. Veirano, P. Perez, S. Besio, P. Castro, and F. Silveira, "Ultra low power pulse generator based on a ring oscillator with direct path current avoidance," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), 2013, pp. 1-4.
[11] P. Pérez-Nicoli, F. Veirano, P. C. Lisboa, and F. Silveira, "High Slew-Rate OTA with Low Quiescent Current Based on Non-Linear Current Mirror," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), Feb 2015, pp. 1-4.
[12] P. Pérez-Nicoli, P. C. Lisboa, F. Veirano, and F. Silveira, "A Series-Parallel Switched Capacitor Step-up DC-DC Converter and its Gate-Control Circuits for Over the Supply Rail Switches," Analog Integrated Circuits and Signal Processing, vol. 85, no. 1, pp. 37-45, 2015.
[13] P. Pérez-Nicoli, F. Veirano, P. C. Lisboa, and F. Silveira, "Low-power operational transconductance amplifier with slew-rate enhancement based on non-linear current mirror," Analog Integrated Circuits and Signal Processing, vol. 89, no. 3, pp. 521-529, 2016.
[14] P. Pérez-Nicoli, F. Veirano, and F. Silveira, "Comparator with self controlled delay for active rectifiers in inductive powering," in IEEE Wireless Power Transfer Conference (WPTC). IEEE, 2018, pp. 1-3.
[15] P. Pérez-Nicoli, F. Veirano, C. Rossi-Aicardi, and P. Aguirre, "Design method for an ultra low power, low offset, symmetric OTA," in 2013 7th Argentine School of Micro-Nanoelectronics, Technology and Applications, Aug 2013, pp. 38-43.
[16] J. P. Oliver, F. Veirano, D. Bouvier, and E. Boemo, "A Low Cost System for Self Measurements of Power Consumption in Field Programmable Gate Arrays," Journal of Low Power Electronics, vol. 13, no. 1, pp. 1-9, 2017.
[17] G. E. Moore, "Cramming more components onto integrated circuits," Proceedings of the IEEE, vol. 86, no. 1, pp. 82-85, Jan 1998.
[18] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh, "Activity-driven clock design for low power circuits," in Proceedings of IEEE International Conference on Computer Aided Design (ICCAD), Nov 1995, pp. 62-65.
[19] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, "1-V power supply high-speed digital circuit technology with multithresholdvoltage CMOS," IEEE Journal of Solid-State Circuits, vol. 30, no. 8, pp. 847-854, Aug 1995.
[20] M. Alioto, E. Consoli, and J. M. Rabaey, ""EChO" Reconfigurable Power Management Unit for Energy Reduction in Sleep-Active Transitions," IEEE Journal of Solid-State Circuits, vol. 48, no. 8, pp. 1921-1932, Aug 2013.
[21] M. Seok, S. Hanson, D. Blaauw, and D. Sylvester, "Sleep Mode Analysis and Optimization With Minimal-Sized Power Gating Switch for UltraLow $V_{d d}$ Operation," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 4, pp. 605-615, April 2012.
[22] A. Wang, A. Chandrakasan, and S. Kosonocky, "Optimal supply and threshold scaling for subthreshold CMOS circuits," Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002, pp. 1-5, 2002.
[23] G. Villar-Piqué, H. J. Bergveld, and E. Alarcón, "Survey and Benchmark of Fully Integrated Switching Power Converters: Switched-Capacitor Versus Inductive Approach," IEEE Transactions on Power Electronics, vol. 28, no. 9, pp. 4156-4167, Sep. 2013.
[24] H. P. Le, S. R. Sanders, and E. Alon, "Design Techniques for Fully Integrated Switched-Capacitor DC-DC Converters," IEEE Journal of Solid-State Circuits, vol. 46, no. 9, pp. 2120-2131, Sept 2011.
[25] A. Wang and A. Chandrakasan, "A $180-\mathrm{mV}$ subthreshold FFT processor using a minimum energy design methodology," IEEE Journal of Solid-State Circuits, vol. 40, no. 1, pp. 310-319, 2005.
[26] B. Z. B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. P. S. Pant, D. Blaauw, and T. Austin, "A 2.60pJ/inst subthreshold sensor processor for optimal energy efficiency," in 2006 Symposium on VLSI Circuits. Digest of Technical Papers. Honolulu, HI: IEEE, 2006, pp. 154-155.
[27] B. H. Calhoun and A. P. Chandrakasan, "A $256-\mathrm{kb} 65-\mathrm{nm}$ sub-threshold SRAM design for ultra-low-voltage operation," IEEE Journal of Solid-State Circuits, vol. 42, no. 3, pp. 680-688, 2007.
[28] S. Hanson, M. Seok, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "A low-voltage processor for sensing applications with picowatt standby mode," IEEE Journal of Solid-State Circuits, vol. 44, no. 4, pp. 1145-1155, 2009.

## Bibliography

[29] D. Jeon, M. Seok, C. Chakrabarti, D. Blaauw, and D. Sylvester, "A superpipelined energy efficient subthreshold $240 \mathrm{MS} / \mathrm{s}$ FFT core in 65 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 47, no. 1, pp. 23-34, 2012.
[30] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in IEEE International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2005, pp. 20-25.
[31] J. Kwong and A. Chandrakasan, "Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits," in Proceedings of the 2006 International Symposium on Low Power Electronics and Design, Oct 2006, pp. 8-13.
[32] M. Slimani, F. Silveira, and P. Matherat, "Variability modeling in nearthreshold CMOS digital circuits," Microelectronics Journal, vol. 46, no. 12, pp. 1313-1324, 2015.
[33] D. Bol, R. Ambroise, D. Flandre, and J. Legat, "Interests and limitations of technology scaling for subthreshold logic," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 10, pp. 1508-1519, 2009.
[34] B. Calhoun, S. Khanna, R. Mann, and J. Wang, "Sub-threshold circuit design with shrinking CMOS devices," in IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2009, pp. 2541-2544.
[35] B. Zimmer, Y. Lee, A. Puggelli, J. Kwak, R. Jevtić, B. Keller, S. Bailey, M. Blagojević, P. F. Chiu, H. P. Le, P. H. Chen, N. Sutardja, R. Avizienis, A. Waterman, B. Richards, P. Flatresse, E. Alon, K. Asanović, and B. Nikolić, "A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI," IEEE Journal of Solid-State Circuits, vol. 51, no. 4, pp. 930-942, April 2016.
[36] D. Jacquet, F. Hasbani, P. Flatresse, R. Wilson, F. Arnaud, G. Cesana, T. D. Gilio, C. Lecocq, T. Roy, A. Chhabra, C. Grover, O. Minez, J. Uginet, G. Durieu, C. Adobati, D. Casalotto, F. Nyer, P. Menut, A. Cathelin, I. Vongsavady, and P. Magarshack, "A 3 GHz Dual Core Processor ARM Cortex TM -A9 in 28 nm UTBB FD-SOI CMOS With Ultra-Wide Voltage Range and Energy Efficiency Optimization," IEEE Journal of Solid-State Circuits, vol. 49, no. 4, pp. 812-826, April 2014.
[37] P. Flatresse, "Process and design solutions for exploiting FD-SOI technology towards energy efficient SOCs," in 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2014, pp. 127-130.
[38] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage logic circuits exploiting gate level dynamic body biasing in 28nm UTBB FD-SOI," Solid-State Electronics, dec 2015.
[39] S. A. Vitale, P. W. Wyatt, N. Checka, J. Kedzierski, and C. L. Keast, "FDSOI Process Technology for Subthreshold-Operation Ultralow-Power Electronics," Proceedings of the IEEE, vol. 98, no. 2, pp. 333-342, 2010.
[40] P. Magarshack, P. Flatresse, and G. Cesana, "UTBB FD-SOI: A Process/Design Symbiosis for Breakthrough Energy-efficiency," in Design, Automation $\mathfrak{E}^{2}$ Test in Europe Conference $\mathfrak{E}^{2}$ Exhibition (DATE), 2013. Grenoble, France: IEEE, 2013, pp. 952-957.
[41] J. Mazurier, O. Weber, F. Andrieu, C. Le Royer, O. Faynot, and M. Vinet, "Variability of planar Ultra-Thin Body and Buried oxide (UTBB) FDSOI MOSFETs," in 2014 IEEE International Conference on IC Design $\xi^{3}$ Technology. Austin, TX: IEEE, 2014, pp. 1-4.
[42] J.-P. J. Noel, O. Thomas, M.-A. Jaud, O. Weber, T. Poiroux, C. FenouilletBeranger, P. Rivallin, P. Scheiblin, F. Andrieu, M. Vinet, O. Rozeau, F. Boeuf, O. Faynot, and A. Amara, "Multi-UTBB FDSOI Device Architectures for Low-Power CMOS Circuit," IEEE Transactions on Electron Devices, vol. 58, no. 8, pp. 2473-2482, 2011.
[43] M. D. Seeman and S. R. Sanders, "Analysis and Optimization of SwitchedCapacitor DC-DC Converters," IEEE Transaction on Power Electronic, vol. 23, no. 2, pp. 841-851, March 2008.
[44] J. De Vos, D. Flandre, and D. Bol, "A Sizing Methodology for On-Chip Switched-Capacitor DC/DC Converters," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 61, no. 5, pp. 1597-1606, 2014.
[45] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65 nm Sub- $V_{t}$ Microcontroller With Integrated SRAM and Switched Capacitor DCDC Converter," IEEE Journal of Solid-State Circuits, vol. 44, no. 1, pp. 115-126, Jan 2009.
[46] C. L. Seitz, A. H. Frey, S. Mattisson, S. D. Rabin, D. A. Speck, and J. L. A. de Snepscheut, "Hot clock nMOS," in Proceedings of Chapel Hill Conference on VLSI, 1985, pp. 1-17.
[47] J. G. Koller and W. C. Athas, "Adiabatic switching, low energy computing, and the physics of storing and erasing information," in Proceedings of Physics of Computation Workshop, 1992.
[48] W. C. Athas, J. G. Koller, and L. Svensson, "An energy-efficient CMOS line driver using adiabatic switching," in Fourth Great Lakes Symposium on Design Automation of High Performance VLSI Systems. IEEE, 1994, pp. 196-199.

## Bibliography

[49] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and E. YingChin Chou, "Low-power digital systems based on adiabatic-switching principles," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 2, no. 4, pp. 398-407, 1994.
[50] S. G. Younis, "Asymptotically zero energy computing using split-level charge recovery logic," Ph.D. dissertation, Massachusetts Institute of Technology, 1994.
[51] L. Svensson and J. Koller, "Adiabatic charging without inductors," in Proc. of the Int. Workshop on Low-Power Design. Citeseer, 1994, pp. 59-164.
[52] L. Svensson, W. Athas, and J. Koller, "System and method for power-efficient charging and discharging of a capacitive load from a single source," Jan. 25 2011, US Patent RE42,066.
[53] S. Nakata, H. Makino, J. Hosokawa, T. Yoshimura, S. Iwade, and Y. Matsuda, "Energy Efficient Stepwise Charging of a Capacitor Using a DC-DC Converter With Consecutive Changes of its Duty Ratio," IEEE Transactions on Circuits and Systems: TCAS-I Regular papers, vol. 61, no. 7, pp. 2194-2203, July 2014.
[54] D. Chernichenko, A. Kushnerov, and S. Ben-Yaakov, "Adiabatic charging of capacitors by Switched Capacitor Converters with multiple target voltages," in IEEE 27th Convention of Electrical \& Electronics Engineers in Israel (IEEEI), 2012, pp. 1-4.
[55] L. Svensson, W. Athas, and R.-C. Wen, "A sub-CV2 pad driver with 10 ns transition time," in Proc. IEEE ISLPED. IEEE, 1996, pp. 105-108.
[56] H. Hirano and T. Sumi, "Semiconductor device with means for charge recycling," Dec. 2 1997, US Patent 5,694,445.
[57] W. C. Athas, R. K. Lal, and L. G. Svensson, "Power-efficient, pulsed driving of capacitive loads to controllable voltage levels," Feb. 16 2010, US Patent 7,663,618.
[58] B.-D. Choi and O.-K. Kwon, "Stepwise data driving method and circuits for low-power TFT-LCDs," IEEE Transactions on Consumer Electronics, vol. 46, no. 4, pp. 1155-1160, 2000.
[59] S. Nakata and Y. Kado, "Adiabatic charging register circuit," Apr. 18 2006, US Patent 7,030,672.
[60] A. Bhattacharya and J. Melanson, "Stepped voltage drive for driving capacitive loads," Nov. 5 2013, US Patent 8,575,975.
[61] S. Khanna, K. Craig, Y. Shakhsheer, S. Arrabi, J. Lach, and B. H. Calhoun, "Stepped supply voltage switching for energy constrained systems," in Proc. IEEE ISQED. IEEE, 2011, pp. 1-6.
[62] K. Craig, Y. Shakhsheer, S. Arrabi, S. Khanna, J. Lach, and B. Calhoun, "A 32 b 90 nm Processor Implementing Panoptic DVS Achieving Energy Efficient Operation From Sub-Threshold to High Performance," IEEE Journal of SolidState Circuits, vol. 49, no. 2, pp. 545-552, Feb 2014.
[63] V. De, P. Hazucha, T. Karnik, S. T. Moon, and G. Schrom, "Stepwise drivers for DC/DC converters," Jul. 1 2008, US Patent 7,394,298.
[64] B.-D. Yang and L.-S. Kim, "A low-power ROM using charge recycling and charge sharing techniques," IEEE Journal of Solid-State Circuits, vol. 38, no. 4, pp. 641-653, 2003.
[65] S. Huda, J. Anderson, and H. Tamura, "Charge recycling for power reduction in FPGA interconnect," in 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2013, pp. 1-8.
[66] K. Ueda, F. Morishita, S. Okura, L. Okamura, T. Yoshihara, and K. Arimoto, "Low-Power On-Chip Charge-Recycling DC-DC Conversion Circuit and System," IEEE Journal of Solid-State Circuits, vol. 48, no. 11, pp. 2608-2617, Nov 2013.
[67] E. Pakbaznia, F. Fallah, and M. Pedram, "Charge recycling in power-gated CMOS circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 1798-1811, 2008.
[68] H. Fujita, "A Resonant Gate-Drive Circuit Capable of High-Frequency and High-Efficiency Operation," IEEE transactions on Power Electronics, vol. 25, no. 4, pp. 962-969, 2010.
[69] M. Alimadadi, S. Sheikhaei, G. Lemieux, P. Palmer, S. Mirabbasi, and W. Dunford, "A 660 MHz ZVS DC-DC Converter Using Gate-Driver ChargeRecycling in $0.18 \mu \mathrm{~m}$ CMOS with an Integrated Output Filter," in Proc. IEEE Power Electron. Spec. Conf. IEEE, 2008, pp. 140-146.
[70] C. Jia, H. Chen, W. Hao, C. Zhang, and Z. Wang, "A Charge Recycling Method for Step-Down SC Converter in Energy Harvesting Systems," in Proc. IEEE ICCCAS. IEEE, 2009, pp. 720-723.
[71] S. Nakata, R. Honda, H. Makino, S. Mutoh, M. Miyama, and Y. Matsuda, "General Stability of Stepwise Waveform of an Adiabatic Charge Recycling Circuit With Any Circuit Topology," IEEE Transactions on Circuits and Systems: TCAS-I Regular papers, vol. 59, no. 10, pp. 2301-2314, 2012.

## Bibliography

[72] Y. Takahashi, T. Sekine, and M. Yokoyama, "Theoretical analysis of power clock generator based on the switched capacitor regulator for adiabatic CMOS logic," in Argentine School of Micro-Nanoelectronics, Technology and Applications, Sept 2008, pp. 17-22.
[73] N. Retdian, S. Takagi, and N. Fujii, "Voltage controlled ring oscillator with wide tuning range and fast voltage swing," in IEEE Asia-Pacific Conference on ASICs, 2002, pp. 201-204.
[74] K.-U. Stein, "Noise-induced error rate as limiting factory for energy per operation in digital ICs," IEEE Journal of Solid-State Circuits, vol. 12, no. 5, pp. 527-530, Oct. 1977.
[75] K. Natori and N. Sano, "Scaling limit of digital circuits due to thermal noise," Journal of applied physics, vol. 83, no. 10, pp. 5019-5024, 1998.
[76] L. B. Kish, "End of Moore's law: thermal (noise) death of integration in micro and nano electronics," Physics Letters A, vol. 305, no. 3, pp. 144-149, 2002.
[77] V. Kleeberger and U. Schlichtmann, "Reliability analysis of digital circuits considering intrinsic noise," in 2011 3rd Asia Symposium on Quality Electronic Design (ASQED). IEEE, 2011, pp. 167-173.
[78] R. Swanson and J. Meindl, "Fundamental performance limits of MOS integrated circuits," in IEEE International Solid-State Circuits Conference. Digest of Technical Papers., vol. 18. IEEE, 1975, pp. 110-111.
[79] T. H. Morshed, W. Yang, M. Dunga, X. Xi, J. He, W. Liu, M. Kanyu, X. Jin, J. Ou, M. Chan, and A. Niknejad, "Bsim4. 6.4 mosfet model. user's manual," University of California: Berkeley, CA, USA, pp. 10-19, 2009.
[80] W. Zhao and Y. Cao, "New generation of predictive technology model for sub-45 nm early design exploration," IEEE Transactions on Electron Devices, vol. 53, no. 11, pp. 2816-2823, 2006.
[81] J. Lohstroh, E. Seevinck, and J. De Groot, "Worst-case static noise margin criteria for logic circuits and their mathematical equivalence," IEEE Journal of Solid-State Circuits, vol. 18, no. 6, pp. 803-807, 1983.
[82] E. Seevinck, F. List, and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," IEEE Journal of Solid-State Circuits, vol. 22, no. 5, pp. 748754, 1987.
[83] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat, "Impact of Technology Scaling on Digital Subthreshold Circuits," 2008 IEEE Computer Society Annual Symposium on VLSI, pp. 179-184, 2008.
[84] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann, and A. Chandrakasan, "A 65 nm sub-vt microcontroller with integrated sram and switched-capacitor dc-dc converter," in 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, Feb 2008, pp. 318-616.
[85] H. Reyserhove, N. Reynders, and W. Dehaene, "Ultra-low voltage datapath blocks in 28 nm utbb fd-soi," in 2014 IEEE Asian Solid-State Circuits Conference ( $A-S S C C$ ), Nov 2014, pp. 49-52.
[86] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," IEEE Journal of SolidState Circuits, vol. 40, no. 9, pp. 1778-1785, 2005.
[87] D. Bol, D. Flandre, and J.-D. Legat, "Technology flavor selection and adaptive techniques for timing-constrained 45 nm subthreshold circuits," in Proceedings of the 14 th ACM/IEEE international symposium on Low power electronics and design (ISLPED). San Fancisco, CA, USA: IEEE, 2009, p. 21.
[88] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, "Exploring variability and performance in a sub-200-mV processor," IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 881-890, 2008.
[89] M. Alioto, "Impact of NMOS/PMOS imbalance in ultra-low voltage CMOS standard cells," in 20th European Conference on Circuit Theory and Design (ECCTD), 2011, pp. 536-539.
[90] G. de Streel and D. Bol, "Impact of back gate biasing schemes on energy and robustness of ULV logic in 28 nm UTBB FDSOI technology," in 2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED). Beijing: IEEE, 2013, pp. 255-260.
[91] Y. Okamura, T. Ishihara, and H. Onodera, "Independent n-well and p-well biasing for minimum leakage energy operation," in 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), July 2018, pp. 177-182.
$[92]$ D. S. Truesdell and B. H. Calhoun, "Channel length sizing for power minimization in leakage-dominated digital circuits," in 2018 IEEE SOI-3DSubthreshold Microelectronics Technology Unified Conference (S3S), Oct 2018, pp. 1-2.
[93] A. A. Vatanjou, E. Late, T. Ytterdal, and S. Aunet, "Ultra-low voltage adders in 28 nm fdsoi exploring poly-biasing for device sizing," in 2016 IEEE Nordic Circuits and Systems Conference (NORCAS), Nov 2016, pp. 1-4.

## Bibliography

[94] A. A. Vatanjou, E. Låte, T. Ytterdal, and S. Aunet, "Ultra-low voltage and energy efficient adders in 28 nm fdsoi exploring poly-biasing for device sizing," Microprocessors and Microsystems, vol. 56, pp. 92-100, 2018.
[95] A. A. Vatanjou, T. Ytterdal, and S. Aunet, "28 nm utbb-fdsoi energy efficient and variation tolerant custom digital-cell library with application to a subthreshold mac block," in 2016 MIXDES - 23rd International Conference Mixed Design of Integrated Circuits and Systems, June 2016, pp. 105-110.
[96] B. Pelloux-prayer, S. Haendler, A. Valentian, and P. Flatresse, "Performance analysis of multi-VT design solutions in 28nm UTBB FD-SOI technology," in IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S). Monterey, CA: IEEE, 2013, pp. 1-2.
[97] P. Flatresse, B. Giraud, J.-P. Noel, B. Pelloux-Prayer, F. Giner, D.-K. Arora, F. Arnaud, N. Planes, J. L. Coz, O. Thomas, S. Engels, G. Cesana, R. Wilson, and P. Urard, "Ultra-Wide Body-Bias Range LDPC Decoder in 28nm UTBB FDSOI Technology," in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco, CA: IEEE, 2013, pp. 424-426.

## Glossary

$E_{\text {Act }}$ Energy consumed in active mode of duty cycled ULP systems.
$P_{\text {Act }}$ Average power in active mode of duty cycled ULP systems.
$P_{A v}$ Total average power of duty cycled ULP systems.
$P_{\text {Losses }}$ Total power losses in the dc-dc converter.
$P_{L}$ Power delivered to the load by the dc-dc converter.
$P_{\text {Sleep }}$ Average power in sleep mode of duty cycled ULP systems.
$T_{A c t}$ Time in active mode of duty cycled ULP systems.
$T_{\text {Task }}$ Time available to do the specific task in duty cycled ULP systems.
$U_{T}$ Thermal voltage.
$V B P_{n}$ Back plane voltage NMOS.
$V B P_{p}$ Back plane voltage PMOS.
$V_{N}$ Noise RMS voltage.
$V_{O H}$ Output logic level high.
$V_{O L}$ Output logic level low.
$V_{T}$ Transistor threshold voltage.
$V_{d d}$ Supply voltage.
$\eta_{d c-d c}$ Dc-dc converter efficiency.
gnd Ground.
$k$ Boltzmann's constant.
$n$ Subthreshold slope factor.

This page was intentionally left blank.

## Acronyms

AIMDs Active Implantable Medical Devices.
ALB Asymmetric Length Biasing.
BPB Back Plane Biasing.
CCO Current Controlled Oscillator. CS Charge Sharing.

DIBL Drain-Induced Barrier Lowering.
DVFS Dynamic Voltage and Frequency Scaling.
FD-SOI Fully Depleted Silicon on Insulator.
IC Integrated Circuit.
ICs Integrated Circuits.
IoT Internet of Things.
MAC Multiply-Accumulate Block.
MEP Minimum Energy Point.
OBB Optimum Back Plane Biasing. OTA Operational Transconductance Amplifier.

PM Power Management.
PMU Power Management Unit.
PSD Power Spectral Density.
RCA Ripple Carry Adder.
RFID Radio Frequency Identification.
RMS Root-Mean-Square.
SBB Symmetric Back Plane Biasing.
SC Switched Capacitor.
SC Switched Capacitors.
SoC System on Chip.

## Acronyms

ULP Ultra Low Power.
UTBB Ultra-Thin Body and Box.
WPTL Wireless Power Transfer Link.
WSNs Wireless Sensor Networks.

## List of Tables

2.1 dc-dc Converter Sizing ..... 26
2.2 Stepwise Driver Sizing ..... 26
2.3 SC dc-dc converter design ..... 33
3.1 Transistors size ..... 42
3.2 Minimum operating $V_{d d}$ due to intrinsic noise in the best-case noise margin scenario. ..... 46
3.3 Minimum operating $V_{d d}$ due to intrinsic noise in the worst-case noise margin scenario. ..... 47
4.1 Monte Carlo Simulation Results ..... 73
4.2 ALB in bulk process. PTM Simulation Results ..... 75

This page was intentionally left blank.

## List of Figures

1.1 Duty Cycled ULP Systems ..... 2
1.2 Simplified block diagram of ULP systems ..... 3
1.3 Chain of 25 inverters. Activity factor 1. 28 nm FD-SOI technology. ..... 5
2.1 Series-Parallel step down converter. ..... 11
2.2 Architecture of stepwise charging. ..... 13
2.3 Charge and discharge of the main capacitance using stepwise charg- ing. $N=3$ ..... 15
2.4 Energy consumption vs number of auxiliary capacitors ( $N$ ). Com- parison estimated (lines) vs simulated (dots). $C=178 f F, T_{C D}=$ $10 \mathrm{~ns}, C_{a u x_{i}}=10 C$ and, for example, in the case of $\mathrm{N}=4$, the size of the auxiliary switches (in Fig. 2.2) are $W_{s w 0}=400 \mathrm{~nm}$, $W_{s w 1}=500 \mathrm{~nm}, W_{s w 2}=1.6 \mu \mathrm{~m}, W_{s w 3}=2.9 \mu \mathrm{~m}, W_{s w 4}=1.5 \mu \mathrm{~m}$, $W_{s w 5}=1.2 \mu \mathrm{~m}, L=120 \mathrm{~nm}$ ..... 17
2.5 Control pulses needed to implement the technique. Example $N=2$ ..... 19
2.6 Architecture selected to generate a pulse. ..... 20
2.7 Logic block using the pulse generator of Fig. 2.6. Example $N=2$. ..... 20
2.8 Capacitance dependence. $T_{C D}=10 \mathrm{~ns}$ ..... 21
2.9 Capacitance dependence. $C=1.4 p F$ ..... 23
2.10 Saving vs capacitance that is charged and discharged $(C)$ vs charge and discharge time $T_{C D}$. ..... 24
2.11 Prototype SC dc-dc converter ..... 25
2.12 Efficiency measurements of eight prototypes of the dc-dc converter with the recycle technique and without it for different output volt- ages. Load current $I_{L}=60 \mu \mathrm{~A}$ ..... 27
2.13 Efficiency measurements of the dc-dc converter with the recycle technique and without it for different load currents. ..... 28
2.14 Saving obtained using the recycle technique in the power consump- tion of the converter switches as a function of the output voltage.28
2.15 Charge sharing technique ..... 29

## List of Figures

$$
\begin{aligned}
& \text { 2.16 Charge sharing technique example signals. An edge in VCD indi- } \\
& \text { cates that the capacitors have to start charging/discharging. VC1 } \\
& \text { CS and VC2 CS are the voltage in the two capacitors whereas using } \\
& \text { the CS technique and VC1 and VC2 are without using the tech- } \\
& \text { nique. The signals showed in the second graph correspond to the } \\
& \text { nodes with the same name in Fig. 2.15. . . . . . . . . . . . . . } 30
\end{aligned}
$$

2.17 Savings $\%$ vs $\mathrm{C} 1=\mathrm{C} 2=\mathrm{C}$ for different $T_{C D}$. ..... 32
2.18 SC dc-dc converter architecture. ..... 33
2.19 Gate drive power consumption vs output voltage. Without CS (dashed line, woCS) and with CS (solid line, wCS). ..... 34
2.20 Efficiency vs output voltage. Without CS (dashed line, woCS) and with CS (solid line, wCS). $I=300 \mu A$. ..... 35
3.1 Failure rate vs relationship $S=V_{T H} / V_{N}$ using Eq. (3.4). ..... 40
3.2 Inverters bandwidth characterization setup. ..... 42
3.3 Noise RMS voltage vs $V_{d d}$ for different technology nodes. $T=$ $120^{\circ} \mathrm{C} . V_{i n}=V_{O L}$ ..... 43
$3.4 S=V_{T H} / V_{N}$ vs $V_{d d}$ for different technology nodes. $T=120^{\circ} \mathrm{C}$. $V_{i n}=V_{O L}$. ..... 44
3.5 Noise RMS voltage vs $V_{d d}$ for 32 nm with corner simulations. $T=$ $120^{\circ} \mathrm{C} . V_{i n}=V_{O L}$. Best-case noise margin analysis. ..... 45
$3.6 S=V_{T H} / V_{N}$ vs $V_{d d}$ for 32 nm with corner simulations. $T=120^{\circ} \mathrm{C}$. $V_{\text {in }}=V_{O L}$. Best-case noise margin analysis. ..... 45
3.7 Bandwidth in the Best-Case scenario and in the Worst-Case scenario for the 32 nm process ..... 47
3.8 Comparison between different definitions of minimum operating volt- age. ..... 48
$3.9 S=V_{T H} / V_{N}$ vs $V_{d d}$ for 32 nm with corner simulations. $T=120^{\circ} \mathrm{C}$. $V_{i n}=V_{O L} . W_{p} / W_{n}=4.5$. Worst-case noise margins analysis. ..... 49
3.10 Transient noise simulation. 32 nm node. $V d d=150 \mathrm{mV}$. FS corner. ..... 50
3.11 Example of $W_{e f f n(p)}$ and $\tau_{n(p)}$ in a simple chain of inverters. ..... 54
4.1 Devices in the 28 nm UTBB FD-SOI technology used. [36] ..... 59
4.2 Back plane biasing schemes. ..... 61
4.3 Chain of inverters with an activity factor of 0.5 . ..... 61
4.4 $V B P n=-V B P p$. LVT devices. Energy per operation normalized to minimum energy of the OBB as a function of supply voltage and back plane voltage (solid contour). Imbalance between PMOS and NMOS leakage current as a function of supply voltage and back plane voltage (dashed contour). ..... 62
4.5 $V B P n=0$. LVT devices. Energy per operation normalized to minimum energy of the OBB as a function of supply voltage and back plane voltage (solid contour). Imbalance between PMOS and NMOS leakage current as a function of supply voltage and back plane voltage (dashed contour).
4.6 $V B P n=0$. LVT devices. Energy per operation normalized to minimum energy of the OBB as a function of supply voltage and back plane voltage (solid contour). Imbalance between PMOS and NMOS leakage current as a function of supply voltage and back plane voltage (dashed contour). LVT devices modified with respect to previous figures, $W_{e f f n}$ and $W_{e f f p} . W_{e f f n}=38, W_{e f f p}=12 .$.
4.7 Energy per operation normalized to minimum energy as a function of supply voltage, back plane voltage of the NMOS and back plane voltage of the PMOS. LVT devices. The black solid line are the points predicted by the OBB to be the optimum from the energy point of view and the dashed line are the points used by the SBB.
4.8 Minimum energy per operation normalized (to minimum energy obtained with LVT devices, minimum size, OBB scheme) as a function of the maximum frequency of the inverters chain for different $V_{T}$ flavours and sizes. The $V_{d d}$ corresponds for each case to the one of the minimum energy point. 1 k Monte Carlo simulation results are included for some points, showing the improvement in variability.
4.9 NMOS LVT subthreshold slope factor as a function of transistors length.
4.10 Chain of inverters with an activity factor of 0.1. . . . . . . . . . . . 68
4.11 Asymmetric Length Biasing (ALB). Energy per operation (solid black lines) normalized to the minimum energy and leakage current imbalance ( $I_{L, n} / I_{L, p}$ ) (dashed grey lines) as a function of $V_{d d}$ and $L n$ (length of NMOS). LVT devices. $L p=30 \mathrm{~nm} . W p=W n=80 \mathrm{~nm}$. $V B P n=V B P p=0 V$
4.12 Energy per operation (solid black lines) normalized to the minimum energy achieved with ALB and leakage current imbalance ( $I_{L, n} / I_{L, p}$ ) (dashed grey lines) as a function of $V_{d d}$ and $W p$ (width of PMOS). LVT devices. $L p=L n=30 \mathrm{~nm} . W n=80 \mathrm{~nm}$. Test circuit 1 .
4.13 Symmetrical minimum sizing vs ALB. Energy per operation (solid black lines) normalized to the minimum energy achieved with ALB (Fig. 4.11) and performance contours (dashed grey lines) (in MHz) as a function of supply voltage and back plane voltage. LVT devices.


## List of Figures

4.14 Symmetrical upsizing vs ALB Upsizing. Energy per operation (solid
black lines) normalized the minimum energy achieved with ALB
(Fig. 4.11) and performance contours (dashed grey lines) (in MHz)
as a function of supply voltage and back plane voltage. LVT devices.
$V_{B P n}=-V_{B P p}=V B$. a) $L n=L p=50 \mathrm{~nm}$ b) $L n=110 \mathrm{~nm}$
$L p=40 \mathrm{~nm} . \ldots . .$. . . . . . . . . . . . . . . . . .
4.15 Energy contours ( aJ ) as a function of $V_{d d}$ and the length of the NMOS (Ln) for different temperatures. $V_{B P p}=V_{B P n}=0 V$. LVT
4.16 Energy vs frequency for the different techniques proposed. In each case, the optimum $V_{d d}$ was selected. ..... 76
4.17 Architecture of 1-bit section of the full adder. ..... 78
4.18 Relationship between the number of NMOS and PMOS that are imposing the leaking vs the different possible inputs. RCA ..... 79
4.19 Energy contours (aJ) as a function of $V_{d d}$ and $V B P_{p}$ for three dif- ferent inputs. $V B P_{n}=0 V$. LVT devices. ..... 80
4.20 Energy contours (aJ) as a function of $V_{d d}$ and the length of the NMOS (Ln) for three different inputs. $V B P_{n}=V B P_{p}=0 V$. LVT devices. ..... 80
4.21 Energy contours (aJ) as a function of $V_{d d}$ and $V B P_{p}$ for three dif- ferent inputs. $V B P_{n}=0 V$. LVT devices. ..... 81
4.22 Energy contours (aJ) as a function of $V_{d d}$ and the length of the NMOS (Ln) for three different inputs. $V B P_{n}=V B P_{p}=0 V$. LVT devices. ..... 81
A. 1 Charge transfer between capacitors. ..... 89
A. 2 Convergence between 2 capacitors. ..... 90

Esta es la última página.
Compilado el 20-08-2019.
http://iie.fing.edu.uy/


[^0]:    ${ }^{1}$ We will denote by $V_{T H}$ the threshold voltage that the noise must overcome to cause a bit flip with a given probability or frequency. It should not be mistaken for the MOS transistor threshold voltage.

