Researchers have developed artificial neural networks that mimic these aspects of the brain in order to recognize patterns. Instead of writing computer code to program these networks, researchers train them. A common method is called supervised learning, in which researchers collect a large set of data, such as photos of objects to be recognized, and label them appropriately. For example, to train a network to recognize pets, researchers would collect and label photos of animals such as dogs, cats, and rabbits.
The network consists of layers of nodes arranged in columns and connected left to right by links. On the left side, each input node (think of each node as a neuron) is assigned to a particular part of the photo to be recognized. Each of the links of the network has a weight, which determines how strongly it propagates a signal. Initially these weights, which range from 0 to 1, are set to random values. The training algorithm shows the network a labeled photo. The brightness of each electronic node’s assigned piece of the photo determines how intensely that node fires. The signals from the input nodes cause some of the nodes in the layer to their right to fire, and eventually these signals propagate through the network and cause some nodes in the output layer to fire. In the pet example, each output node is assigned a particular type of pet, so there will be one output node for dogs, one for cats, and one for rabbits. The output node that fires most strongly produces the network’s answer. The training algorithm compares that answer to the label of the photo. If the answer is wrong, the algorithm works its way backward through the network, making small tweaks to the weights of each link until the answer is correct. This process is called backpropagation.
The training algorithm repeats this process with each labeled photo in the training set, making small adjustments to the weights of the links until the network correctly recognizes the pet in each photo. If that were all the network could do, it wouldn’t be very useful. However, it turns out that the trained network will correctly recognize the countless photos of dogs, cats, and rabbits that it hasn’t seen before.
By extracting knowledge directly from data, neural networks avoid the need to write down rules that describe the world. This approach makes them better for capturing knowledge that’s hard to describe in words. For example, suppose that you need to collect a large set of pictures of refugees. Writing out a set of rules for an expert system to do this would be very difficult. What are the relevant features of such pictures? Many, but not all, pictures of refugees contain crowds of people, but so do pictures of sporting events, urban areas, and nightclubs. And yet we humans have no problem distinguishing refugees from football fans. A lot of our knowledge is hard to express in words.
Computers became powerful enough to run neural networks in the 1980s, but the networks couldn’t be very large, and training was almost as much work as writing rules, since humans have to label each element of the training set. In 1985, DARPA funded two teams under the Autonomous Land Vehicle program to develop self-driving cars. Both teams used neural networks to enable their vehicles to recognize the edges of the road. However, the systems were easily confused by leaves or muddy tire tracks on the road, because the hardware available at the time was not powerful enough. Nonetheless, the program established the scientific and engineering foundations of autonomous vehicles, and some of the researchers went on to NASA to develop the Mars rovers Sojourner, Spirit, and Opportunity. All of these autonomous vehicles operated far longer than specified in their mission plans.
In 2004, DARPA issued a Grand Challenge, with a $1 million prize awarded to the first autonomous vehicle to cross the finish line of a 142-mile off-road course in the California desert near Barstow. The most capable vehicle traveled less than 8 miles before getting stuck. In 2005, DARPA repeated the challenge on a slightly shorter but more difficult course, and this time five vehicles crossed the finish line. The teams that developed those vehicles used neural networks to enable better detection of the track and to distinguish obstacles such as boulders from shadows. Many of those researchers went on to develop self-driving car technologies for Google, Uber, and other car manufacturers.
Now AI seems to be everywhere. Over the past few years, AI constantly has been in the news, due to rapid advances in face recognition, speech understanding, and self-driving cars. Oddly enough, this wave of rapid progress came about largely because teenagers were eager to play highly realistic video games. Video consists of a series of still pictures flashed on a screen, one after the other, to create the illusion of motion. Realistic video requires the creation and display of lots of high-definition pictures, because they must be displayed at a ferocious rate of at least 60 pictures per second. Video screens consist of a dense rectangular array of tiny dots, called pixels. Each pixel can light up in any one of more than 16 million colors. The processors underlying fast-action video games need to create a constant stream of pictures based on the actions of the player and shunt them onto the screen in quick succession.
Over the past few years, AI constantly has been in the news, due to rapid advances in face recognition, speech understanding, and self-driving cars. Oddly enough, this wave of rapid progress came about largely because teenagers were eager to play highly realistic video games.
Enter the graphics processing unit, or GPU, a computer chip specifically designed for this task. GPUs rapidly process large arrays of numbers that represent the colors of the pixels comprising each picture. In 2009, NVIDIA released powerful new GPUs, and researchers soon discovered that these chips are ideal for training neural networks. This enabled the training of deep neural networks that consist of dozens of layers of neurons. Researchers applied algorithms invented in the 1980s and 1990s to create powerful pattern recognizers. They discovered that the initial layers of a deep network could recognize small features, such as edges, enabling subsequent layers to recognize larger features such as eyes, noses, hubcaps, or fenders. Providing more training data makes all neural networks better, up to a point. Deep neural networks can make use of more data to improve their recognition accuracy well past the point at which other approaches cease improving. This superior performance has made deep networks the mainstay of the current wave of AI applications.
Along with GPUs and clever algorithms, the internet has enabled the collection and labeling of the vast amounts of data required to train deep neural networks. Before automated face recognition was possible, Facebook provided tools for users to label their pictures. Crowdsourcing websites recruit inexpensive labor that AI companies can tap to label pictures. The resulting abundance of training data makes it seem that AI systems with superhuman abilities are about to take over the world. However, deep neural networks are woefully inefficient learners, requiring millions of images to learn how to detect objects. They are better thought of as statistical pattern recognizers produced by an algorithm that maps the contours of the training data. Give these algorithms enough pictures of dogs and cats, and they will find the differences that distinguish the one from the other, which might be the texture of fur, the shape of the ear, or some feature that conveys a general sense of “dogness” or “catness.”
For some applications, this inefficiency is not an issue. Internet search engines can now find pictures of just about anything, from cats sitting on suitcases to people playing Frisbee on a beach. For applications where training data is scarce, neural networks can generate it. An approach called generative adversarial networks takes a training set of pictures and pits two networks against each other. One tries to generate new pictures that are similar to the training set, and the other tries to detect the generated pictures. Over multiple rounds, the two networks get better at generation and detection, until the pictures produced are novel, yet usefully close to real ones, so that they can be used to augment a training set. Note that no labels are required for this generation phase, as the objective is to generate new pictures, not classify existing ones.
However, we still lack a solid theoretical foundation to explain how neural networks function. Recently, researchers have discovered that making a miniscule change to an image causes wild misclassification. A picture of a panda is suddenly identified as a monkey, even though it still looks like a panda. Understanding these anomalies is essential, as AI systems are increasingly being used to make critical decisions in areas as varied as medical diagnosis and self-driving vehicles.