Sat. Feb 4th, 2023

Miguel Navarro/Getty Images

As the world’s largest companies pursue autonomous cars, they’re essentially spending billions of dollars getting machines to do what the average two-year-old can do without thinking: identify what they see. Of course, toddlers still have the advantage in some ways. Infamously last year, a driver died while in a Tesla sedan — he wasn’t paying attention when the vehicle’s camera mistook a nearby truck for the sky.

The extent to which these companies have been successful so far is due to a long-dormant form of computation that models certain aspects of the brain. However, this form of calculation pushes current hardware to its limits, since modern computers work very differently from the gray matter in our heads. So while programmers are creating “neural network” software to run on regular computer chips, engineers are also designing “neuromorphic” hardware that can imitate the brain more efficiently. Unfortunately, a type of neural net that has become the standard in image recognition and other tasks, something called a convolutional neural net, or CNN, has resisted replication in neuromorphic hardware.

That is, until recently.

IBM scientists reported in the Procedures of the National Academy of Sciences that they modified CNNs to run on their TrueNorth chip. Other research groups have also reported progress with the solution. The TrueNorth system matches the accuracy of today’s best image and speech recognition systems, but uses a small fraction of the energy and works many times faster. Finally, combining convolutional nets with neuromorphic chips could create more than just a mouthful of jargon; it could lead to smarter cars and cell phones that efficiently understand our verbal commands, even when we’re talking.

Letting technology learn like people

Traditionally, programming a computer required writing step-by-step instructions. For example, to teach the computer to recognize a dog, it may be necessary to list a set of rules to guide its judgment. Check if it is an animal. Check if he has four legs. Check if it is bigger than a cat and smaller than a horse. Check if he barks. etc. But good judgment requires flexibility. What if a computer encounters a small dog that doesn’t bark and only has three legs? You may need more rules, but it’s inefficient and impractical to list endless rules and repeat the process for every type of decision a computer needs to make.

People learn differently. A child can tell dogs and cats apart, walk upright and speak fluently without hearing a single line on these tasks – we learn from experience. As such, computer scientists have traditionally tried to capture some of that magic by modeling software on the brain.

The brain contains about 86 billion neurons, cells that can connect with thousands of other neurons through intricate ramifications. A neuron receives signals from many other neurons, and when the stimulation reaches a certain threshold, it “fires” and sends its own signal to surrounding neurons. A brain learns in part by adjusting the strength of the connections between neurons, called synapses. When an activity pattern is repeated, for example through practice, the contributing connections become stronger and the lesson or skill becomes networked.

In the 1940s, scientists began modeling neurons mathematically, and in the 1950s they began modeling networks of neurons with computers. Artificial neurons and synapses are much simpler than those in the brain, but they work on the same principles. Many simple units (“neurons”) connect to many others (via “synapses”), their numerical values ​​depending on the values ​​of the units signaling to them, weighted by the numerical strength of the connections.

Artificial neural networks (sometimes just called neural networks) usually consist of layers. Visually represented, information or activation travels from one column of circles to the next via lines in between. The operation of networks with many such layers is called deep learning, both because they can learn deeper and because the actual network is deeper. Neural networks are a form of machine learning, the process by which computers modify their behavior based on experience. Today’s nets can drive cars, recognize faces and translate languages. Such advances owe their success to improvements in computer speed, to the vast amount of training data now available online, and to tweaks in the basic algorithms of the neural net.

Of course Wikimedia has a CNN chart with a cute robot...
Enlarge / Of course Wikimedia has a CNN chart with a cute robot…


Convolutional neural networks are a particular type of network that has gained prominence in recent years. CNNs extract important features from stimuli, usually images. An input can be a photo of a dog. This can be represented as a sheet-like layer of neurons, with the activation of each neuron representing a pixel in the image. In the next layer, each neuron takes input from a patch of the first layer and becomes active when it detects a certain pattern in that patch, acting as a sort of filter. In successive layers, neurons look for patterns in the patterns, and so on. Down the hierarchy, the filters can be sensitive to things like edges of shapes, then certain shapes, then paws, then dogs, until it tells you if it sees a dog or a toaster.

Crucially, the internal filters do not need to be programmed by hand to look for shapes or legs. You only need to present the network inputs (pictures) and correct outputs (picture labels) to the network. If he’s wrong, he tweaks his connections a bit until, after many, many shots, the connections automatically become sensitive to useful functions. This process resembles how the brain processes vision, from low-level details to object recognition. Any information that can be represented spatially — two dimensions for photos, three for video, one for strings of words in a sentence, two for audio (time and frequency) — can be parsed and understood by CNNs, making them widely useful.

Although Yann LeCun — now Facebook’s director of AI research — first proposed CNNs in 1986, they only revealed their strong potential after making a few tweaks to the way they worked. In 2012, Geoffrey Hinton, now a top AI expert at Google, and two of his graduate students used a CNN to win something called the ImageNet Challenge, a competition where computers have to recognize scenes and objects. In fact, they won by such a large margin that CNNs took over, and every winner has been a CNN ever since.

Now mimicking the brain is computationally expensive. Since the human brain has billions of neurons and trillions of synapses, it is currently impossible to simulate every neuron and synapse. Simulating even a small piece of brain can require millions of calculations for each piece of input.

So unfortunately, as mentioned above, convolutional neural networks require enormous computational power. With many layers and each layer repeatedly applying the same feature filter to many patches of the previous layer, today’s largest CNNs can have millions of neurons and billions of synapses. Performing all these small calculations does not fit with classical computer architecture, which has to process one instruction at a time. Instead, scientists have moved to parallel computing, which can process many instructions at once.

Today’s advanced neural networks use graphics processing units (GPUs) – the kind used in video game consoles – because they specialize in the kinds of mathematical operations that are useful for deep learning. (Updating all the geometric facets of a moving object at once is a problem similar to computing all the outputs of a given neural net layer at once.) Still, the hardware wasn’t designed to perform deep learning so efficiently. like a brain that can drive a car and at the same time hold a conversation about the future of autonomous vehicles, while using less watts than a light bulb.

By akfire1

Leave a Reply

Your email address will not be published.