Deep Learning Is Only As Good As Its Data

This article was originally published in VentureBeat on March 2, 2018.

“Deep learning” has become a hot topic in the general rush to launch AI products. But many of these products will fail because companies are putting branding ahead of functionality. Success depends on understanding what deep learning is, how it works, and what its most effective applications are.

Deep learning 101

Traditional machine learning algorithms are typically linear, in that they can be represented by only one node that linearly transforms input to output. Previously called artificial neural networks or neural networks, deep learning uses multiple such nodes, organized like the neural networks originally invented in 1943 to model how human brains work. The more nodes and layers in a neural network, the more sophisticated its learning capabilities can become. Although people still use the term “neural networks,” today’s deep learning networks represent how information flows across nodes more than how information in the human brain flows across neurons.

Deep learning requires ample data and training time. But while application development has been slow, recent successes in search, advertising, and speech recognition have many companies clamoring to get in on the action.

Mislabeling and overuse

Vendors’ tendency to label almost anything “deep learning” is a recipe for disappointment because the technology is less effective without sufficient data and domain expertise.

A key issue for machine learning algorithms is selection bias. In sound research, you can define the population, have access to all available population data, and sample a portion of that data. With deep learning, you start with sample data, deploy the model, and then expose it to the real world. But models that work well on training data often perform poorly on real data. Deep learning provides the ability to accurately determine the classification function from inputs to an output. However, there is no guarantee that the model will perform accurately on input data from the population if the training data is not representative.

This data failure is more common when training data isn’t developed by domain experts. While deep learning might eliminate the need to have domain experts in the feature extraction part of the classification process, it still requires expertise in the data extraction process. In fact, deep learning might be overkill when a domain expert can explicitly describe the linear or nonlinear function using logic and rules.

For example, if a baker applied deep learning to making bread, a robot’s action, such as telling the automated bread maker to stop kneading, could be more explicitly defined by a domain expert (i.e., a baker) based on input values. In this case, those would be the attributes of the bread dough, like consistency and temperature.

In scenarios such as this, companies that focus on collecting data points might be better served by speaking to an expert. The bottom line is that much of what is marketed as “deep learning” is likely to be ineffective or difficult to manage properly. And “deep reinforcement learning,” as implemented in autonomous robots, self-driving cars, and creation of images, voices, and videos, is far from being widely available. Buying into deep learning hype without doing due diligence could lead to general disillusionment and another AI winter.

Achieving greater accuracy

We may someday reach the point where AI and deep learning will help us achieve superintelligence or even bring on the singularity. But our challenge, and duty, as artificial intelligence professionals today is to ensure that deep learning applications live up to their billing and deliver benefits to users and society.