Deep learning is a branch of machine learning that teaches computers to do what comes naturally to humans: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. Deep learning is especially suited for image recognition, which is important for solving problems such as facial recognition, motion detection, and many advanced driver assistance technologies such as autonomous driving, lane detection, pedestrian detection, and autonomous parking.
Deep Learning Toolbox™ provides simple MATLAB® commands for creating and interconnecting the layers of a deep neural network. Examples and pretrained networks make it easy to use MATLAB for deep learning, even without knowledge of advanced computer vision algorithms or neural networks.
For a free hands-on introduction to practical deep learning methods, see Deep Learning Onramp.
What Do You Want to Do? | Learn More |
---|---|
Perform transfer learning to fine-tune a network with your data | Start Deep Learning Faster Using Transfer Learning Tip Fine-tuning a pretrained network to learn a new task is typically much faster and easier than training a new network.
|
Classify images with pretrained networks | Pretrained Deep Neural Networks |
Create a new deep neural network for classification or regression | |
Resize, rotate, or preprocess images for training or prediction | Preprocess Images for Deep Learning |
Label your image data automatically based on folder names, or interactively using an app | Train Network for Image Classification Image Labeler (Computer Vision Toolbox) |
Create deep learning networks for sequence and time series data. | |
Classify each pixel of an image (for example, road, car, pedestrian) | Getting Started with Semantic Segmentation Using Deep Learning (Computer Vision Toolbox) |
Detect and recognize objects in images | Deep Learning, Semantic Segmentation, and Detection (Computer Vision Toolbox) |
Classify text data | Classify Text Data Using Deep Learning |
Classify audio data for speech recognition | Speech Command Recognition Using Deep Learning |
Visualize what features networks have learned | |
Train on CPU, GPU, multiple GPUs, in parallel on your desktop or on clusters in the cloud, and work with data sets too large to fit in memory | Deep Learning with Big Data on GPUs and in Parallel |
To learn more about deep learning application areas, including automated driving, see Deep Learning Applications.
To choose whether to use a pretrained network or create a new deep network, consider the scenarios in this table.
Use a Pretrained Network for Transfer Learning | Create a New Deep Network | |
---|---|---|
Training Data | Hundreds to thousands of labeled images (small) | Thousands to millions of labeled images |
Computation | Moderate computation (GPU optional) | Compute intensive (requires GPU for speed) |
Training Time | Seconds to minutes | Days to weeks for real problems |
Model Accuracy | Good, depends on the pretrained model | High, but can overfit to small data sets |
For more information, see Choose Network Architecture.
Deep learning uses neural networks to learn useful representations of features directly from data. Neural networks combine multiple nonlinear processing layers, using simple elements operating in parallel and inspired by biological nervous systems. Deep learning models can achieve state-of-the-art accuracy in object classification, sometimes exceeding human-level performance.
You train models using a large set of labeled data and neural network architectures that contain many layers, usually including some convolutional layers. Training these models is computationally intensive and you can usually accelerate training by using a high performance GPU. This diagram shows how convolutional neural networks combine layers that automatically learn features from many images to classify new images.
Many deep learning applications use image files, and sometimes millions of image files. To
access many image files for deep learning efficiently, MATLAB provides the imageDatastore
function. Use this
function to:
Automatically read batches of images for faster processing in machine learning and computer vision applications
Import data from image collections that are too large to fit in memory
Label your image data automatically based on folder names
This example shows how to use deep learning to identify objects on a live webcam using only 10 lines of MATLAB code. Try the example to see how simple it is to get started with deep learning in MATLAB.
Run these commands to get the downloads if needed, connect to the webcam, and get a pretrained neural network.
camera = webcam; % Connect to the camera net = alexnet; % Load the neural network
If you need to install the webcam
and alexnet
add-ons, a message from each function appears with a link to help you download the free add-ons using Add-On Explorer. Alternatively, see Deep Learning Toolbox Model for AlexNet Network and MATLAB Support Package for USB Webcams.
After you install Deep Learning Toolbox Model for AlexNet Network, you can use it to classify images. AlexNet is a pretrained convolutional neural network (CNN) that has been trained on more than a million images and can classify images into 1000 object categories (for example, keyboard, mouse, coffee mug, pencil, and many animals).
Run the following code to show and classify live images. Point the webcam at an object and the neural network reports what class of object it thinks the webcam is showing. It will keep classifying images until you press Ctrl+C. The code resizes the image for the network using imresize
(Image Processing Toolbox).
while true im = snapshot(camera); % Take a picture image(im); % Show the picture im = imresize(im,[227 227]); % Resize the picture for alexnet label = classify(net,im); % Classify the picture title(char(label)); % Show the class label drawnow end
In this example, the network correctly classifies a coffee mug. Experiment with objects in your surroundings to see how accurate the network is.
To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.
To learn how to extend this example and show the probability scores of classes, see Classify Webcam Images Using Deep Learning.
For next steps in deep learning, you can use the pretrained network for other tasks. Solve new classification problems on your image data with transfer learning or feature extraction. For examples, see Start Deep Learning Faster Using Transfer Learning and Train Classifiers Using Features Extracted from Pretrained Networks. To try other pretrained networks, see Pretrained Deep Neural Networks.
Transfer learning is commonly used in deep learning applications. You can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is much faster and easier than training from scratch. You can quickly make the network learn a new task using a smaller number of training images. The advantage of transfer learning is that the pretrained network has already learned a rich set of features that can be applied to a wide range of other similar tasks.
For example, if you take a network trained on thousands or millions of images, you can retrain it for new object detection using only hundreds of images. You can effectively fine-tune a pretrained network with much smaller data sets than the original training data. If you have a very large dataset, then transfer learning might not be faster than training a new network.
Transfer learning enables you to:
Transfer the learned features of a pretrained network to a new problem
Transfer learning is faster and easier than training a new network
Reduce training time and dataset size
Perform deep learning without needing to learn how to create a whole new network
For an interactive example, see Transfer Learning with Deep Network Designer.
For a programmatic example, see Train Deep Learning Network to Classify New Images.
Feature extraction allows you to use the power of pretrained networks without investing time
and effort into training. Feature extraction can be the fastest way to use deep
learning. You extract learned features from a pretrained network, and use those features
to train a classifier, for example, a support vector machine (SVM — requires
Statistics and Machine Learning Toolbox™). For example, if an SVM trained using alexnet
can
achieve >90% accuracy on your training and validation set, then fine-tuning with
transfer learning might not be worth the effort to gain some extra accuracy. If you
perform fine-tuning on a small dataset, then you also risk overfitting. If the SVM
cannot achieve good enough accuracy for your application, then fine-tuning is worth the
effort to seek higher accuracy.
For an example, see Extract Image Features Using Pretrained Network.
Neural networks are inherently parallel algorithms. You can take advantage of this parallelism by using Parallel Computing Toolbox™ to distribute training across multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs.
Training deep networks is extremely computationally intensive and you can usually accelerate training by using a high performance GPU. If you do not have a suitable GPU, you can train on one or more CPU cores instead. You can train a convolutional neural network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a cluster. Using GPU or parallel options requires Parallel Computing Toolbox.
You do not need multiple computers to solve problems using data sets too large to fit in
memory. You can use the imageDatastore
function to work with
batches of data without needing a cluster of machines. However, if you have a cluster
available, it can be helpful to take your code to the data repository rather than moving
large amounts of data around.
To learn more about deep learning hardware and memory settings, see Deep Learning with Big Data on GPUs and in Parallel.