Energy-efficient computing using faulty hardware
Today, energy consumption has become the main hurdle that prevents an increase in the quality and number of tasks that computer systems can perform. This is true both for data center applications, and even more so for all the embedded devices that are taking a growing importance in our lifes. Unfortunately, the energy efficiency of standard CMOS integrated circuits has been progressing very slowly. Therefore, we need to develop new digital system design approaches to continue improving energy efficiency.
Some energy gains can be made by moving tasks that were previously performed by general purpose processors onto dedicated hardware, but for tasks that already have dedicated hardware implementations, improvements can only come from a change to the way digital circuits operate.
One promising approach based on standard CMOS circuits is to allow circuit modules to be occasionally faulty. Faulty circuits cost less energy because they allow getting rid of the various safety margins that are used in standard circuits. This is particularly interesting in situations where safety margins have to be large, such as when operating CMOS circuits in the near-threshold regime.
The question then becomes: how can we build reliable computing systems out of unreliable components? For telecommunication and signal processing systems, a promising approach is to expose the circuit faults directly to the algorithms, and use either an algorithm’s natural redundancy or carefully designed additional redundancy to preserve the same quality of results when operating on faulty hardware.
Using redundancy for energy efficiency in various contexts
- LDPC Decoders
- Deep neural networks
- c-partite associative memories: This associative memory structure invented by Gripon and Berrou can be shown to be very robust to faulty hardware. It also has an efficient hardware implementation, making it attractive as a specialized cache memory, for instance for network routing.