### **Bio Inspired Fault Tolerance in VLSI Systems – A Survey**

Ancy C P

M.E student in VLSI Design, Department of Electronics and Communication Engineering, NGCE, Anna University Chennai, Tamilnadu, India

**Abstract:** Living in a Digital age, where each and every small to large things rely upon Electronics, we cannot afford to have unreliable products no matter how small the applications be. But everything human make is prone to errors even if it is given all his care and attention. But there are some applications where the cost of an error can be so huge that we cannot afford to handle, may be the lives of fellow humans or huge set back in economic terms. In such harsh [space] and hostile [nuclear power plants] environments, we need reliable devices that can survive in the conditions that a human eye and hand cannot reach. And that is where we started to think about building fault tolerant devices inspired from nature. Our nature is blessed with millions or organisms that have complex systems than any man-made devices. They survive the most dreadful conditions that Mother Nature throws to them and still its day-today activities are not hindered. This survey paper take a look into the bio- inspired fault tolerant VLSI systems hoping that it will help future innovations and researches in space and other applications.

Key Words: Bio-inspired electronics, Fault tolerance, Prokaryotic cells, Reliability, Unicellular organisms.

### I. Introduction

As our world of science has reached till the surface of Mars and even thinking of making a second home somewhere out there in space, the complexity of electronic devices have grown along. Everything humans make is complicated and are prone to errors no matter how carefully it is designed.

From the beginning of the recorded history, man has believed in the influence of heavenly bodies on the life on earth. Machines, electronics included, are considered scientific objects whose fate is controlled by man. So, in spite of the knowledge of the exact date and time of its manufacturer, we do not draft a horoscope for a machine. Lately, however we have started noticing certain behaviors in the state of the art electronic circuits whose traces are caused to be external and to the celestial bodies outside our earth. The Single Event upset [SEU] [1] as this non – permanent error behavior is termed, affects the electronic devices very badly. After sifting through the trends in ideas about developing a system that can handle such errors and survive harsh environments, the findings ended up in the word 'fault tolerance' [2]. It can be defined as:

"Fault tolerance: the capability of systems to function despite one or more critical failures, by use of redundant circuits or functions and/or reconfigurable elements" --- NASA Thesaurus.

Despite ground breaking advances in most engineering and scientific disciplines during the past decades, reliability engineering has not seen significant breakthroughs or noticeable advances [3]. In order to guarantee failure free system operation engineers still apply the same established principles that would ensure continued proper operation in the presence of a fault. This would either be graceful degradation where quality of system operation proportionally decreases with the severity of the failure or incorporating redundancies. Triple modular redundancy [TMR] is still one of the most prevalent techniques used today on the system level [4] where three systems are replicated that in the presence of a single fault would still guarantee failure free operation. But very high cost is associated with replicating entire systems.

Nearly 90% of the system crashes are attributed to 'soft' transient faults [5] where only the memory content of the system mutates that in turn causes system to malfunction. These errors are induced by temporary environmental conditions, such as cosmic rays and electromagnetic interference. Such errors are most likely to happen in critical applications where we cannot afford for an error to occur. Engineers tried many methods like the TMR and DMR [Double modular redundancy] but it increased the complexity and cost of the systems. And when it comes to innovation, invention and transformation, nature can be excellent place to start. Historically a source of inspiration for many disciplines (witness man's early attempts to fly using contraptions that imitate birds); in recent times nature has become a growing driver in the design and creation of electronic devices. VLSI circuits have been extensively used in almost all applications because of its reduced size, cost, reliability and its very low power consumption. Many bio-inspired changes have been made in VLSI circuits to make it fault tolerant which is been surveyed through this paper. The rest of this paper is organized as follows. Section II gives a brief account about the VLSI circuits. Section III discusses the various causes of errors in VLSI circuits that are not affordable to happen in critical applications. Section IV gives a brief outline about the bio-inspired

fault tolerance. Section V explains about the various bio-inspired fault tolerant techniques used till date. Section VI discusses about the proposed method and finally, conclusions are drawn in Section VII.

### II. Importance Of VLSI Circuits

Very Large Scale Integration [VLSI] is the process of creating an integrated circuit [IC] by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed [6]. The microprocessor is a VLSI device. Before the introduction of VLSI technology most of the ICs had a limited set of functions they could perform. An electronic circuit might consist of a CPU, RAM, ROM and other glue logic. VLSI lets IC makers add all of these into one chip. These circuits, that would have taken a board full of space can now be put into a small space few millimeters across!! VLSI circuits are everywhere – in your car, computer, digital cameras, cell phone and whatever you have.But as a side effect of these advances, there has been a dramatic proliferation of tools that can be used to design VLSI circuits. Alongside, obeying Moore's law [7], the capability of an IC has increased exponentially over the years, in terms of computation power, utilization of available area and yield. The combined effect of these two advances is that people can now put diverse functionality into the IC's, opening up new frontiers.

But as the need increases, its demand also increases. More complex and life – critical applications started demanding VLSI circuits that could help them to be more reliable and efficient with very less space requirement and low power consumption. Even though the biggest asset of VLSI circuits are its less space, it became tough to be reliable as the application demands and it became very prone to soft errors which is caused mainly by cosmic radiations. As normally humans do to search for an alternative measure when one fails, scientists learned that nothing can function as an alternative to the VLSI circuits. The only solution available was to make it fault tolerant so that it can function even in the presence of the faults.

### III. Errors In VLSI Circuits

Failures in VLSI systems could result from varied types of faults that can be classified as either soft (transient) or permanent (hardware) ones. Transient faults are induced by temporary environmental conditions, such as cosmic rays and electromagnetic interference and could for example cause information mutation in memory elements [8]. Permanent faults are the result of irreversible device and circuit changes, such as the following:

- a. Electro migration, which causes thinning and eventual open circuit of metal tracks.
- b. Hot carrier effect, which causes shift in device threshold voltage and its conveyed conductance.
- c. Time dependant dielectric breakdown, which causes gate oxide to substrate short circuit.

### IV. Bio-Inspired Fault Tolerance

Nature offers to us some remarkable examples of how to deal with complexity and it associated unreliability. For example, human body is one of the most complex systems ever known. Local failures are common, but the overall function of our organism is so reliable because of the self-diagnosis and self-healing mechanisms that work ceaselessly throughout our bodies. These mechanisms are the result of millions of years of our genes' evolution.

During the past few years the work done on bio-inspired systems have generated some remarkable results. Genetic algorithms, neural networks, artificial brains and evolvable hardware and only few of them. What attracts scientists and engineers to nature lies in the characteristics biological organisms posses. These characteristics include evolvability, multi cellular structures, auto regulation and learning that allow them to adapt to the changes in their living environment.

A recent approach to fault tolerance is borrowing from nature the main principles that make living things so resilient to faults. Mechanisms such as self-diagnosis, self-healing, reproduction and adaptation are being transported to the arena of electronics. All these characteristics seems to be a natural consequence of the massively parallel arrays of cells that constitute every living being. The following section discusses the various bio-inspired fault tolerant techniques used in VLSI circuits.

### V. Bio-Inspired Fault Tolerant Techniques

### 5.1 A Phylogenetic, Ontogenetic and Epigenetic View of Bio-inspired Hardware Systems [9]

Life on earth from its beginning can be distinguished into three levels of organization: Phylogenetic, Ontogenetic and Epigenetic levels. The phylogenetic level concerns about the temporal evolution of the genetic programs within individuals. The ontogenetic level concerns the developmental process of a single multi cellular organism. The epigenetic level concerns the learning process during an individual organism's lifetime. In analogy to nature, the space of bio inspired hardware systems can be partitioned along these three axes giving rise to the POE model [Fig 1]. The phylogenetic axis is also referred to as 'evolvable hardware'. The main motivation behind it is to accomplish difficult tasks, possibly involving real time behavior in a complex, dynamic environment. The ontogenetic axis is also referred as 'replicating and regenerating hardware'. It involves development of a single individual from its own genetic material. It is considered orthogonal to phylogeny. The main process in the axis is concluded as growth or construction. The epigenetic axis is termed as 'learning hardware'. It involves learning through environmental interactions. Taking all these knowledge into account, engineers implemented VLSI systems that could evolve, replicate and that can learn from its own working environment and that's the main advantage of this model. The disadvantage of this system is that a single inversion in the genome can cause the entire destruction of the system.



# 5.2 Embryonics: A New Methodology for Designing Field programmable Gate Arrays with Self-repair and Self-replicating properties [10]

The growth and the operation of all living beings are directed through a chemical program the DNA string or genome. This process is the source of inspiration for the Embryonic project whose objective is the conception of VLSI endowed with properties associated with the living world: Self repair and Self replication. This project begins by showing that any logic system can be represented by an ordered binary decision diagram [OBDD] and then embedded into a fine-grained FPGA whose basic cell is a multiplexer with programmable connections. The cellular array thus obtained is perfectly homogenous. The function of each cell is defined by a gene and all the genes in the array, each associated with a pair of coordinates, make up the blue print [genome] of the artificial organism. A human being consists of approximately 60 trillion cells. At each instant, in each of these 60 trillion cells, the genome, a ribbon of 2 billion characters, are decoded to produce the proteins needed for the survival of the organism. This genome contains the instructions for both the construction and operation of the organism. The parallel execution of 60 trillion genomes in as many cells occurs ceaselessly from the conception to the death of the cell. Faults are rare and in the majority of cases, successfully detected and repaired. This process is remarkable for its complexity and precision. By adopting certain features of cellular organization and by transposing them to the 2D world of integrated circuits on silicon, the properties unique to the living world such as self-replication and self-repair are achieved. These two properties seem particularly desirable for complex artificial systems meant for hostile [nuclear plants] and inaccessible [space] environments. Self replication allows the complete reconstruction of the original device in case of a major fault. Self repair allows a partial reconstruction in case of a minor fault and these contributes for the embryonic project to be advantageous. The main disadvantage of these systems is he absence of a demonstrating system for detection of location of faults.

### 5.3 A Hierarchical Self-Repairing Architecture for Fast Fault Recovery of Digital Systems Inspired From Paralogous Gene Regulatory Circuits [11]

As the level of application increases so does the complexity of digital systems used. And it has confirmed the need for the digital systems to be self-repairing. Currently available self-repairing systems have some limitations such as storage overhead required to prepare all possible rewiring strategies and temporal incorrectness caused by elongated repairing time. The proposed architectures that has three layers provide a self repairing architecture for fast fault recovery with an efficient use of limited resources, which can be easily applied to real complex digital systems such as fly-by-wire systems, deep space probes, satellites and nuclear reactor control systems. The three layers are working layer which employs a hybrid scheme for using both redundant and empty cells with a newly devised self-test. The second layer is the control layer, in which an ordered assignment control is proposed. The order of working priority of each processor that controls a normal cell in the working layer is predetermined. A faulty processor is detected by a majority decision among neighboring control processors and corrected by rearranging the order of working-priority. The final and third layer, interface layer, connects an external PC for reprogramming. Through this fault recovery mechanism, the system can keep normal functioning under noisy environments. The advantages of this model are fast fault recovery, usage of very limited resources and increased robustness. The disadvantage is that an external PC is required for partial reconfiguration of a cell.

# 5.4 Self-repairing digital System With Unified Recovery Process Inspired by Endocrine Cellular Communication

Self-repairing digital systems have emerged as the most promising alternative for fault-tolerant systems. Such systems are impractical due to complex routing process, loss in efficiency as the circuit size increases and un utilization of normal hardware. A new system is proposed which improves routing by lowering hardware overhead along with increasing the size of circuit and reducing hardware un utilized for fault recovery. Out of the cell-to-cell communication, endocrine cellular communication [Fig.2] is most interesting. Endocrine cell releases hormone from the signaling cell and hormone flows through blood and reaches the target cell. Although blood contains various hormones, only the receptor on the target cell receives the selected hormone. The specific characteristic of endocrine cell inspired for implementing digital system is that it secretes a hormone only if it receives a hormone from another endocrine cell.



Fig 2: Special endocrine cellular communication (a) Normal state (b) recovery state

### VI. Proposed Method

Living organisms are complex systems. Failures are local i.e., failures occur only in a group of particular cells, not as a whole. So their repair will also be taken on the local [cell] level. Engineers have been trying to integrate ideas inspired by nature into the modern silicon technology of today. Even though proposals inspired by multi cellular organisms demonstrated feasibility, resulting systems were very complex. So a new methodology is proposed inspired by the characteristics, morphology and behavior of simpler prokaryotic bacteria and bacterial communities. Such simple unicellular organisms could help to build simpler cost effective systems but with improved reliability. Electronic devices have a non zero probability of failure and overall system using them will also demonstrate this degree of unreliability. And that is where these simpler prokaryote bacteria does magic. They are simple unicellular forms of life. They are much simpler, requiring fewer genes in their genome in order to function. Their DNA is duplexed structure, with two strands where one a complemented version of other forming a double helix. Damage to one of the strands of DNA can be detected by enzymes and undamaged stand is then used to repair the other one. Another important characteristic of prokaryotes is Horizontal Gene Transfer [HGT]. It is the ability to learn from and save their environmental experience and transfer such changes to their genetic material to other cells. This process inherently adds valuable features to the natural creature such as: Adaptability, evolvability, resistance against environmental attacks, allows newly recruited cells to be endowed with required genetic properties. The prokaryote bioinspired model refers to a community of prokaryote inspired electronic cells; hence it is called Unitronics from 'unicellular electronics'.

### VII. Conclusion

Various bio-inspired fault tolerant have been studied in the above sections that I hope to be helpful for researchers and engineers for further developments in the field of VLSI systems because mother nature provides ideas that never drains off. In this era, where the technology is sky rocketing fault tolerant systems are very necessary too.

#### References

- [1]. F. Wang and V. D. Agrawal, "Single event upset: An embedded tutorial," in Proc. IEEE 21st Int. Conf. VLSI Design, Jan. 2008, pp.
- 429-434. <u>http://www.eng.auburn.edu/~agrawvd/TALKS/tutorial\_6pg.pdf</u>

   [2]. Aerospace science and technology dictionary f section

   <u>http://www.hq.nasa.gov/office/hqlibrary/aerospacedictionary/aerodictall/f.html</u>
- [3] IEEE TRANSACTIONS ON RELIABILITY, VOL. 42, NO. 2,1993 JUNE "Reliability Growth of Fault-Tolerant Software" Karama Kanoun, Mohamed Kaaniche, Christian BhnesJean-Claude, LaprieJean ArlatL.
- [4]. Fault Tolerant and Correction System Using Triple Modular RedundancyShubham C. Anjankar1, Dr. Mahesh T.
- Kolte2International Journal of Emerging Engineering Research and Technology Volume 2, Issue 2, May 2014, PP 187-191 [5]. <u>http://www.ijeert.org/pdf/v2-i2/31.pdf</u>
- "Error Log Analysis: Statistical Modeling andHeuristic Trend Analysis" IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 4, 1990 OCTOBER Ting-Ting Y. Lin, Member IEEEDaniel P. Siewiorek, Fellow IEEEUniversity of California, San DiegoCarnegie Mellon University, Pittsburgh.
- [7]. "CMOS VLSI Design A Circuits and Systems Perspective" Fourth Edition Neil H. E. Weste Macquarie University and The University of Adelaide, David Money Harris Harvey Mudd College {Addison-Wesley}.
- [8]. 'Moore's law" Past, present and future.'by Robert R Schaller, George mason University, IEEE spectrum June 1997. <u>http://www.mae.ncsu.edu/zhu/courses/mae536/Reading/Moores\_Law.pdf</u>
- [9]. "Single Event Effects (SEE) Mechanism and Effects" F.Sturesson TEC-QEC Based on RADECS Short Course 2003 by S.Duzellier EPFL Space Center 9th June 2009 http://space.epfl.ch/webday/site/space/sbared/industry\_media/07%20SEE%20Effect%20E Sturesson pdf
- http://space.epfl.ch/webdav/site/space/shared/industry\_media/07%20SEE%20Effect%20F.Sturesson.pdf
   "A Phylogenetic, Ontogenetic, and Epigenetic View of Bio-Inspired Hardware Systems" IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 1, NO. 1, APRIL 1997Moshe Sipper, Member, IEEE, Eduardo Sanchez, Member, IEEE, Daniel Mange, Member, IEEE, Marco Tomassini, Andr'es P'erez-Uribe, and Andr'e Stauffer, Member, IEEE.
- [11]. "Embryonics: A New Methodology for Designing Field-Programmable Gate Arrays with Self-Repair and Self-Replicating Properties" IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 6, NO. 3, SEPTEMBER 1998Daniel Mange, Member, IEEE, Eduardo Sanchez, Member, IEEE, Andr'e Stauffer, Member, IEEE, Gianluca Tempesti, Member, IEEE, Pierre Marchal, Member, IEEE, and Christian Piguet
- [12]. "A hierarchical self-repairing architecture for fast fault recovery of digital systems inspired from paralogous gene regulatory circuits" IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 12, December 2012, Sokehwan kim, Hyunho Chu, isaak yang, Sanghoon Hong, Sung Hoon Jung and Kwang Hyung Cho, senior member IEEE.
- [13]. "Self-Repairing Digital System With Unified Recovery Process Inspired by Endocrine Cellular Communication" IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 Isaak Yang, Sung Hoon Jung, and Kwang-Hyun Cho, Senior Member, IEEE