

### Low-energy High-performance Computing based on Superconducting Technology

#### Nobuyuki Yoshikawa Graduate School of Engineering, Yokohama National University, Japan

The circuit was fabricated using ISTEC standard process (STP2). National Institute of Advanced Industrial Science and Technology partially contributed to the circuit fabrication.

YNU Initiative for Global Arts & Sciences YOKOHAMA National University



#### Question

#### What is this?



### Outline

- Background and motivation
- Present status of superconducting computing
  - Japanese and US projects
- The minimum energy in computation?
  - Landauer's principle
  - Adiabatic computing and reversible computing
- Adiabatic quantum flux parametron (AQFP)
- Reversible QFP (RQFP)
- Summary

#### Background

Estimated power consumption to realize an exa-scale computer

> 100 MW

~ \$million/100 MW per year

K computer (Japan) Peak performance: 10.5 PFLOPS Power consumption: 12.6 MW



#### **1st-ranked computers in recent TOP500**



http://www.top500.org/

Low-Power Logic Devices is highly demanded.

### **Energy Consumption in Data Centers**

Explosive increase in electric power of data centers

Approaching 10% of total electric power in nations

#### Facebook Data Center, Lulea, Sweden



Performance: 27-51 PFLOP/s Power 84 MW average (120 MW max)

D.S. Holmes, ISS 2013, Tokyo, Japan.

#### **Energy Consumption in Data Centers**

# Electric power consumed in 100 searches in the internet



Electric power consumed in ironing a shirt



#### **End of the Moor's Law**

Trend of the clock frequency of high-performance processors



#### **Intrinsic Bit Energy of CMOS Logic**



After International Technology Roadmap for Semiconductors, 2009 Edition.

### Single-Flux-Quantum (SFQ) Circuits

#### SFQ circuits



Switching energy

$$E = \Phi_0 I_c \sim 10^{-19} \text{ J}$$

#### **CMOS** circuits



Switching energy

$$E = QV_{\rm DD} \sim 10^{-16} \text{ J}$$

#### **Energy-Delay Product of SFQ and CMOS**



#### Rapid Single-Flux-Quantum (RSFQ) Circuits



- Pulse height ~ 400  $\mu$ V
- Pulse width ~ 3 ps
- Power ~ nW/gate

K. K. Likharev, V. K. Semenov, *IEEE Trans. Appl. Supercond.* 1, 3–28 (1991).



T Flip-flop operating at up to 770 GHz.

W. Chen *et al.*, *IEEE Trans. Appl. Supercond.* 9, 3212–3215 (1999).



**256-b shift register operating at 12 GHz** O. Mukhanov *et al.*, *IEEE Trans. Appl. Supercond 3*, 2578-2581 (1993).

#### **Power Consumption in RSFQ Logic**



12

#### **Energy-Efficient SFQ Circuits**



### **Japanese SFQ Computer Projects**

#### JST-CREST Project (FY2006 - FY2012)

- Leader: Prof. Takagi (Kyoko Univ.)
- "Reconfigurable low-power high-performance processor based on single-flux-quantum circuitry"

#### JST-ALCA SFQ Project (FY2011 – FY2016)

- Leader: Prof. Fujimaki (Nagoya Univ.)
- "Superconductor electronic system combined with optics and spintronics"
- MEXT-JSPS Project (FY2014 FY2018)
  - Leader: Prof. Yoshikawa (Yokohama National Univ.)
  - Study on Adiabatic Single-Flux-Quantum Circuits Operating in the Thermodynamic Energy Limit

### Large-Scale Reconfigurable Data-Path (RDP) Architecture for Accelerator



SMAC: streaming memory

LM:

ORN:

access controller

operand routing network

linear memory

#### 4×4 SFQ-RDP



- ✓ Operating Freq. (Design): 40 GHz
- ✓ Number of Pipeline Stages: 64
- ✓ Number of JJ: 28528
- ✓ Area of Die: 14 mm x 5 mm
- A. Fujimaki et al ASC 2012, Portland.



### Achievements in JST-ALCA Project

#### Institute: Nagoya Univ., Yokohama Nat'l Univ., Kyoto Univ., NICT, AIST



#### **LV-RSFQ Microprocessors**



- ✓ Reduced static & dynamic energy consumption
- ✓ Simplified layout design
- ✓ Interoperability with conventional RSFQ





Execute all the instructions stored in the memory around 50 GHz.





## 100-GHz bit-serial microprocessor

#### Programmable device using magnetic material





Look-up table based on dual-rail SFQ and ferromagnet See EPo1B-05



### RSFQ Microprocessor Development in Japan



### Energy-efficient SFQ Processor/FPGA for Data Center



### **US Computer Projects**

- Cryogenic Computing Complexity Program (C3)
- Sponsored by IARPA
- To demonstrate fully functional cryogenic computer
  - 64-bit processors
  - cryogenic RAM
  - > 2 GHz
  - < 1 nJ / FLOP</p>
- 5 years







#### HYPRES Integrated Memory Process (IMP) (for C3 project and beyond)



#### First of its kind "Digital+" fabrication process 150 mm wafer process integrating SFQ circuits, nTrons and MRAM devices



# The first superconducting parallel 8 bit RISC CPU





NORTHROP GRUMMAN

- Realizable in Lincoln 5ee processes
- Clock is 3.5 GHz
- 10 RQL clocks per instruction
- Can run benchmark algorithms
  - ✓ Fletcher Checksum
  - ✓ Greatest Common Divisor
  - ✓ Integer Divide
  - ✓ Bit Matrix Transpose
  - ✓ Least Common Multiple
  - ✓ Reverse Add

This RQL design is a practical step towards 32-bit SIMD

# Resonator Clock Network Powered RQL is Successful in the Test Lab





#### Outline



- Background and motivation
- Present status of superconducting computing
  - -Japanese and US projects
- The minimum energy in computation?
  - Landauer's principle
  - -Adiabatic computing and reversible computing
- Adiabatic quantum flux parametron (AQFP)
- Reversible QFP (RQFP)
- Summary

#### **Comparison of Energy-Delay Product**



#### **Minimum Energy in Computation?**



Figure 2 Time sequence of potentials starting at A (for a particle known to be near q = 0) and changing continuously to the deep bistable wells at F.

Adiabatic change of the energy potential of logic: Single well → Double well

Minimum energy dissipation when the "entropy" of information decreases:  $\sim k_B T \log 2$ 

R. W. Keyes, R. Landauer, IBM Journal of Research and Development, 14, 152 (1970).

### Landauer's Principle

- Equivalence between thermodynamic entropy and information entropy
- For computation reducing the information entropy, the minimum bit energy,  $E_{bit} = k_B T \ln 2$ , is consumed.
- For computation conserving the information entropy, there is no minimum limit of bit energy in computation.
- In erasure of results in computation, the bit energy is consumed.

R. Landauer, *IBM Journal of Research and Development* 5, 183 (1961).C. H. Bennett, *IBM Journal of Research and Development* 17, 525 (1973).

### **Entropy in Information Theory**

The decrease of the information entropy when the logical bit information is lost:

$$S = k_{\rm B} \ln 2$$

Helmholtz free energy

$$F = U - TS$$

The change of the thermodynamic energy

$$k_{\rm B}T\ln 2$$

#### **Parametric Quantron**

The minimum energy in computation was discussed based on Parametric Quantron.

The minimum energy in computation

$$E_{bit} = \frac{k_B T}{\omega_c \tau \ln[\omega_A \tau p]} \propto f_{clock}$$



Fig. 1. Parametric quantron (a) and its potential energy U as a function of coordinate x (normalized magnetic flux  $\Phi$ ) at various values of parameters  $\lambda = (2\pi/\Phi_0)I_ML$  and  $f = (2\pi/\Phi_0)\Phi_e - \pi$  (b)-(d). Cross denotes a Josephson junction with the critical current  $I_M$  controlled by current  $I_{c}$ .

K. K. Likharev, *IEEE Tran. Magn.* MAG-13, 242 (1977). K. K. Likharev, *Int. J. Theoretical Phys.*, 21, 311 (1982).

#### Verification of the Landauer's Principle using Small Beads





A. Berut et al., Nature, 483, (2012) 187.



Figure 3 | Erasure rate and approach to the Landauer limit. a, Success rate of the erasure cycle as a function of the maximum tilt amplitude,  $F_{\text{max}}$ , for constant  $F_{\text{max}}\tau$ . b, Heat distribution P(Q) for transition  $0 \rightarrow 1$  with  $\tau = 25$  s and  $F_{\text{max}} = 1.89 \times 10^{-14}$  N. The solid vertical line indicates the mean dissipated heat,  $\langle Q \rangle$ , and the dashed vertical line marks the Landauer limit,  $\langle Q \rangle_{\text{Landauer}}$ . c, Mean dissipated heat for an erasure cycle as a function of protocol duration,  $\tau$ , measured for three different success rates, r: plus signs,  $r \ge 0.90$ ; crosses,  $r \ge 0.85$ ; circles,  $r \ge 0.75$ . The horizontal dashed line is the Landauer limit. The continuous line is the fit with the function  $[Aexp(-t/\tau_{\rm K}) + 1]B/\tau$ , where  $\tau_{\rm K}$  is the Kramers time for the low barrier (Methods). Error bars, 1 s.d.

#### **Reversible Computing**

- The entropy is conserved during computation.
- No minimum energy dissipation
- Logically reversible

# Dose logically reversible computing consume no energy?

$$x = c$$

$$y = cp + \overline{c}q$$

$$z = \overline{c}p + cq$$

| Input |   |   | Output |   |   |
|-------|---|---|--------|---|---|
| С     | р | q | Х      | у | Z |
| 0     | 0 | 0 | 0      | 0 | 0 |
| 0     | 0 | 1 | 0      | 1 | 0 |
| 0     | 1 | 0 | 0      | 0 | 1 |
| 0     | 1 | 1 | 0      | 1 | 1 |
| 1     | 0 | 0 | 1      | 0 | 0 |
| 1     | 0 | 1 | 1      | 0 | 1 |
| 1     | 1 | 0 | 1      | 1 | 0 |
| 1     | 1 | 1 | 1      | 1 | 1 |

q

E. Fredkin and T. Toffoli, *Int. J. Theor. Phys.* **21**, 219-253 (1982).

#### **Example of a CNOT Gate**



Logically reversible circuits can be made by using conventional logic.

Physical reversibility is required for reversible computing.

CNOT gate



| Inp | out | Output |   |  |
|-----|-----|--------|---|--|
| С   | р   | X      | У |  |
| 0   | 0   | 0      | 0 |  |
| 0   | 1   | 0      | 1 |  |
| 1   | 0   | 1      | 1 |  |
| 1   | 1   | 1      | 0 |  |

#### **Adiabatic and Reversible Computing**

#### Adiabatic computing



- Potential of the system is changed adiabatically
- No nonadiabatic energy dissipation

$$E_{bit} \propto f_{clock}$$

#### **Reversible computing**



- Input data can be calculated from output data.
- Number of input = Number of output
- No change in information entropy

### Outline

- Background and motivation
- Present status of superconducting computing
  - Japanese and US projects
- The minimum energy in computation?
  - Landauer's principle
  - Adiabatic computing and reversible computing
- Adiabatic quantum flux parametron (AQFP)
- Reversible QFP (RQFP)
- Summary

#### **Energy Potential of RSFQ Circuits**



#### **Energy Potential of RSFQ Circuits**



Adiabatic operation of the system is required for energy-efficient computing.

#### Operation Principle of Quantum Flux Parametron (QFP)



E. Goto, Pros. 1st RIKEN Symp. Josephson Electronics, 1984. YNU YOKOHAMA National University

#### **Potential Energy of QFP**



M. Hosoya et al, IEEE Trans. Appl. Supercond. vol. 1, 1991, pp. 77 – 89. YNU YOKOHAMA National University

#### **Non-adiabatic and Adiabatic QFP**



#### **Evolution of Phase Differences at 4.2 K**



#### Bit Energy vs. Clock Period of AQFP



When rise time is 1000 ps,  $E_{\rm bit} = 0.023 I_{\rm c} \Phi_0$  (~  $20k_{\rm B}T$ ).

#### → 1/1000 of RSFQ

N. Takeuchi, et. al., SUST, 26, 035010 (2013).

#### Bit Energy Measurement of AQFP using a **Superconducting Resonator**





N. Takeuchi, et. al., Appl. Phys. Lett., 102, 052602 (0010)



### **AQFP Logic Family**

**AQFP** buffer

![](_page_44_Figure_3.jpeg)

Layout of AQFP buffer

![](_page_44_Figure_5.jpeg)

#### **Majority gate**

![](_page_44_Figure_7.jpeg)

x = MAJ(a, b, c)= ab+bc+ca **NAND** gate

constant

![](_page_44_Figure_10.jpeg)

$$x = MAJ(\overline{a}, 1, \overline{b}) \\ = \overline{ab}$$

### Demonstration of AQFP 8-bit Carry-Look-Ahead Adder

Energy per operation

 12 aJ @5GHz
 (cf. Bit energy of a single RSFQ gate: ~10 aJ)

 Designed clock frequency:

 5 GHz

 Junction number:

 1152 (β<sub>c</sub> = 5.0)

 Circuit area: 2.7 x 1.7 mm<sup>2</sup>

![](_page_45_Figure_3.jpeg)

The circuits were fabricated using AIST standard process (STP2).

N. Takeuchi et al., J. Appl. Phys. 117, 173912 (2015).

# Comparison of Energy Consumption of CMOS and AQFP

|                      | CMOS                                                                                    | AQFP                                                                                                                                                                                             |
|----------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Device<br>parameters | Technology: 45 nm<br>Supply voltage: V <sub>DD</sub> = 1.0 V<br>Frequency: f = 2.85 GHz | Technology: 2 $\mu$ m (J <sub>c</sub> = 2.5 kA/cm <sup>2</sup> )<br>Critical current: I <sub>c</sub> = 50 $\mu$ A,<br>Frequency: f = 5 GHz<br>Inductance: ( $\beta_L$ , $\beta_q$ ) = (0.4, 1.6) |
| Energy/bit           | ~1 fJ (~10 <sup>-15</sup> J)                                                            | 10 zJ (~10 <sup>-20</sup> J)                                                                                                                                                                     |

- Energy consumption of AQFP is five orders of magnitude lower than state-of-the-art CMOS devices.
- Further energy reduction is possible by using unshunted junctions.
- Further energy reduction is possible by using high-J<sub>c</sub> process.

### Outline

- Background and motivation
- Present status of superconducting computing
  - Japanese and US projects
- The minimum energy in computation?
  - Landauer's principle
  - Adiabatic computing and reversible computing
- Adiabatic quantum flux parametron (AQFP)
- Reversible QFP (RQFP)
- Summary

### Switching energy of AQFP

![](_page_48_Figure_2.jpeg)

N. Takeuchi et al., Phys. Rev. Appl. 4, 034007 (2015).

#### **Reversible AQFP (RQFP)**

#### **Reversible majority QFP gate**

![](_page_49_Figure_3.jpeg)

A logically and physically reversible gate can be achieved by using MAJ and SPL gates.

N. Takeuchi, et. al., Scientific Reports 4, 6354 (2014).

#### **Physical Reversibility of RQFP**

![](_page_50_Figure_2.jpeg)

#### **Energy Dissipation of RQFP Gate**

![](_page_51_Figure_2.jpeg)

N. Takeuchi, et. al., Scientific Reports 4, 6354 (2014).

#### **Demonstration of Physical Reversibility**

![](_page_52_Figure_2.jpeg)

53

### Energy-Delay Products of Superconducting Logic

![](_page_53_Figure_2.jpeg)

#### Summary

- Current research activities in Japan and US were reviewed.
- Adiabatic quantum flux parametron (AQFP) is extremely energy efficient logic.
  - ~10 zJ/bit @5 GHz
  - Three orders of magnitude smaller than energy-efficient SFQ logic
  - Six orders of magnitude smaller than CMOS logic
- Sub-k<sub>B</sub>T bit-energy operation is possible using AQFP gate with high-Q junctions.
  - ~10 yJ/bit (~ 0.2 k<sub>B</sub>T) @100 MHz
- Reversible logic can be realized based on AQFP.

#### Conclusions

- The superconducting logic is only the technology that breaks through the thermal limit in computation.
- We still have a lot of possibilities for improving the energy efficiency in computation using superconducting circuits.

### **My Special Thanks to**

Dr. O. Mukhanov, Dr. Q Herr, Prof. A Fujimaki, Prof. M. Tanaka

Prof. N. Takeuchi, Prof. C. Ayala, Prof. Y. Yamanashi, Prof. T. Ortlepp, Prof. C. Fourie, Mr. F. China, Mr. N. Tsuji, Mr. M. Narama, Mr. Y. Murai, Ms. Q. Xu and Mr. K. Fang.