Neural networks, a style of machine-learning model, are getting used to assist humans complete a wide range of tasks, from predicting if someone’s credit rating is high enough to qualify for a loan to diagnosing whether a patient has a certain disease. But researchers still have only a limited understanding of how these models work. Whether a given model is perfect for certain task stays an open query.
MIT researchers have found some answers. They conducted an evaluation of neural networks and proved that they might be designed so that they are “optimal,” meaning they minimize the probability of misclassifying borrowers or patients into the improper category when the networks are given numerous labeled training data. To realize optimality, these networks have to be built with a selected architecture.
The researchers discovered that, in certain situations, the constructing blocks that enable a neural network to be optimal will not be those developers use in practice. These optimal constructing blocks, derived through the brand new evaluation, are unconventional and haven’t been considered before, the researchers say.
In a paper published this week within the , they describe these optimal constructing blocks, called activation functions, and show how they might be used to design neural networks that achieve higher performance on any dataset. The outcomes hold at the same time as the neural networks grow very large. This work could help developers select the right activation function, enabling them to construct neural networks that classify data more accurately in a wide selection of application areas, explains senior writer Caroline Uhler, a professor within the Department of Electrical Engineering and Computer Science (EECS).
“While these are latest activation functions which have never been used before, they’re easy functions that somebody could actually implement for a selected problem. This work really shows the importance of getting theoretical proofs. Should you go after a principled understanding of those models, that may actually lead you to latest activation functions that you just would otherwise never have considered,” says Uhler, who can also be co-director of the Eric and Wendy Schmidt Center on the Broad Institute of MIT and Harvard, and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS) and Institute for Data, Systems and Society (IDSS).
Joining Uhler on the paper are lead writer Adityanarayanan Radhakrishnan, an EECS graduate student and an Eric and Wendy Schmidt Center Fellow, and Mikhail Belkin, a professor within the Halicioğlu Data Science Institute on the University of California at San Diego.
Activation investigation
A neural network is a style of machine-learning model that’s loosely based on the human brain. Many layers of interconnected nodes, or neurons, process data. Researchers train a network to finish a task by showing it hundreds of thousands of examples from a dataset.
As an example, a network that has been trained to categorise images into categories, say dogs and cats, is given a picture that has been encoded as numbers. The network performs a series of complex multiplication operations, layer by layer, until the result is only one number. If that number is positive, the network classifies the image a dog, and whether it is negative, a cat.
Activation functions help the network learn complex patterns within the input data. They do that by applying a change to the output of 1 layer before data are sent to the subsequent layer. When researchers construct a neural network, they select one activation function to make use of. In addition they select the width of the network (what number of neurons are in each layer) and the depth (what number of layers are within the network.)
“It seems that, should you take the usual activation functions that individuals use in practice, and keep increasing the depth of the network, it gives you actually terrible performance. We show that should you design with different activation functions, as you get more data, your network will improve and higher,” says Radhakrishnan.
He and his collaborators studied a situation through which a neural network is infinitely deep and wide — which implies the network is built by continually adding more layers and more nodes — and is trained to perform classification tasks. In classification, the network learns to put data inputs into separate categories.
“A clean picture”
After conducting an in depth evaluation, the researchers determined that there are only 3 ways this sort of network can learn to categorise inputs. One method classifies an input based on nearly all of inputs within the training data; if there are more dogs than cats, it’s going to resolve every latest input is a dog. One other method classifies by selecting the label (dog or cat) of the training data point that the majority resembles the brand new input.
The third method classifies a brand new input based on a weighted average of all of the training data points which can be just like it. Their evaluation shows that that is the one approach to the three that results in optimal performance. They identified a set of activation functions that at all times use this optimal classification method.
“That was probably the most surprising things — irrespective of what you select for an activation function, it’s just going to be one in every of these three classifiers. We’ve got formulas that can let you know explicitly which of those three it will be. It’s a really clean picture,” he says.
They tested this theory on a several classification benchmarking tasks and located that it led to improved performance in lots of cases. Neural network builders could use their formulas to pick out an activation function that yields improved classification performance, Radhakrishnan says.
In the longer term, the researchers wish to use what they’ve learned to research situations where they’ve a limited amount of knowledge and for networks that will not be infinitely wide or deep. In addition they wish to apply this evaluation to situations where data do not need labels.
“In deep learning, we would like to construct theoretically grounded models so we are able to reliably deploy them in some mission-critical setting. It is a promising approach at getting toward something like that — constructing architectures in a theoretically grounded way that translates into higher ends in practice,” he says.
This work was supported, partly, by the National Science Foundation, Office of Naval Research, the MIT-IBM Watson AI Lab, the Eric and Wendy Schmidt Center on the Broad Institute, and a Simons Investigator Award.