You can take a pretrained image classification network that has already learned to extract powerful and informative features from natural images and use it as a starting point to learn a new task. The majority of the pretrained networks are trained on a subset of the ImageNet database [1], which is used in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [2]. These networks have been trained on more than a million images and can classify images into 1000 object categories, such as keyboard, coffee mug, pencil, and many animals. Using a pretrained network with transfer learning is typically much faster and easier than training a network from scratch.
You can use previously trained networks for the following tasks:
Purpose | Description |
---|---|
Classification | Apply pretrained networks directly to classification problems. To
classify a new image, use |
Feature Extraction | Use a pretrained network as a feature extractor by using the layer activations as features. You can use these activations as features to train another machine learning model, such as a support vector machine (SVM). For more information, see Feature Extraction. For an example, see Extract Image Features Using Pretrained Network. |
Transfer Learning | Take layers from a network trained on a large data set and fine-tune on a new data set. For more information, see Transfer Learning. For a simple example, see Get Started with Transfer Learning. To try more pretrained networks, see Train Deep Learning Network to Classify New Images. |
Pretrained networks have different characteristics that matter when choosing a network to apply to your problem. The most important characteristics are network accuracy, speed, and size. Choosing a network is generally a tradeoff between these characteristics. Use the plot below to compare the ImageNet validation accuracy with the time required to make a prediction using the network.
Tip
To get started with transfer learning, try choosing one of the faster networks, such as SqueezeNet or GoogLeNet. You can then iterate quickly and try out different settings such as data preprocessing steps and training options. Once you have a feeling of which settings work well, try a more accurate network such as Inception-v3 or a ResNet and see if that improves your results.
Note
The plot above only shows an indication of the relative speeds of the different networks. The exact prediction and training iteration times depend on the hardware and mini-batch size that you use.
A good network has a high accuracy and is fast. The plot displays the classification accuracy versus the prediction time when using a modern GPU (an NVIDIA® Tesla® P100) and a mini-batch size of 128. The prediction time is measured relative to the fastest network. The area of each marker is proportional to the size of the network on disk.
The classification accuracy on the ImageNet validation set is the most common way to measure the accuracy of networks trained on ImageNet. Networks that are accurate on ImageNet are also often accurate when you apply them to other natural image data sets using transfer learning or feature extraction. This generalization is possible because the networks have learned to extract powerful and informative features from natural images that generalize to other similar data sets. However, high accuracy on ImageNet does not always transfer directly to other tasks, so it is a good idea to try multiple networks.
If you want to perform prediction using constrained hardware or distribute networks over the Internet, then also consider the size of the network on disk and in memory.
There are multiple ways to calculate the classification accuracy on the ImageNet validation set and different sources use different methods. Sometimes an ensemble of multiple models is used and sometimes each image is evaluated multiple times using multiple crops. Sometimes the top-5 accuracy instead of the standard (top-1) accuracy is quoted. Because of these differences, it is often not possible to directly compare the accuracies from different sources. The accuracies of pretrained networks in Deep Learning Toolbox™ are standard (top-1) accuracies using a single model and single central image crop.
To load the SqueezeNet network, type squeezenet
at the command
line.
net = squeezenet;
For other networks, use functions such as googlenet
to get links
to download pretrained networks from the Add-On Explorer.
The following table lists the available pretrained networks trained on ImageNet and some of their properties. The network depth is defined as the largest number of sequential convolutional or fully connected layers on a path from the input layer to the output layer. The inputs to all networks are RGB images.
Network | Depth | Size | Parameters (Millions) | Image Input Size |
---|---|---|---|---|
squeezenet | 18 | 5.2 MB | 1.24 | 227-by-227 |
googlenet | 22 | 27 MB | 7.0 | 224-by-224 |
inceptionv3 | 48 | 89 MB | 23.9 | 299-by-299 |
densenet201 | 201 | 77 MB | 20.0 | 224-by-224 |
mobilenetv2 | 53 | 13 MB | 3.5 | 224-by-224 |
resnet18 | 18 | 44 MB | 11.7 | 224-by-224 |
resnet50 | 50 | 96 MB | 25.6 | 224-by-224 |
resnet101 | 101 | 167 MB | 44.6 | 224-by-224 |
xception | 71 | 85 MB | 22.9 | 299-by-299 |
inceptionresnetv2 | 164 | 209 MB | 55.9 | 299-by-299 |
shufflenet | 50 | 5.4 MB | 1.4 | 224-by-224 |
nasnetmobile | * | 20 MB | 5.3 | 224-by-224 |
nasnetlarge | * | 332 MB | 88.9 | 331-by-331 |
darknet19 | 19 | 78 MB | 20.8 | 256-by-256 |
darknet53 | 53 | 155 MB | 41.6 | 256-by-256 |
efficientnetb0 | 82 | 20 MB | 5.3 | 224-by-224 |
alexnet | 8 | 227 MB | 61.0 | 227-by-227 |
vgg16 | 16 | 515 MB | 138 | 224-by-224 |
vgg19 | 19 | 535 MB | 144 | 224-by-224 |
*The NASNet-Mobile and NASNet-Large networks do not consist of a linear sequence of modules.
The standard GoogLeNet network is trained on the ImageNet data set but you can
also load a network trained on the Places365 data set [3]
[4]. The network trained on
Places365 classifies images into 365 different place categories, such as field,
park, runway, and lobby. To load a pretrained GoogLeNet network trained on the
Places365 data set, use googlenet('Weights','places365')
. When
performing transfer learning to perform a new task, the most common approach is to
use networks pretrained on ImageNet. If the new task is similar to classifying
scenes, then using the network trained on Places365 could give higher
accuracies.
Feature extraction is an easy and fast way to use the power of deep learning without
investing time and effort into training a full network. Because it only requires a
single pass over the training images, it is especially useful if you do not have a GPU.
You extract learned image features using a pretrained network, and then use those
features to train a classifier, such as a support vector machine using fitcsvm
(Statistics and Machine Learning Toolbox).
Try feature extraction when your new data set is very small. Since you only train a simple classifier on the extracted features, training is fast. It is also unlikely that fine-tuning deeper layers of the network improves the accuracy since there is little data to learn from.
If your data is very similar to the original data, then the more specific features extracted deeper in the network are likely to be useful for the new task.
If your data is very different from the original data, then the features extracted deeper in the network might be less useful for your task. Try training the final classifier on more general features extracted from an earlier network layer. If the new data set is large, then you can also try training a network from scratch.
ResNets are often good feature extractors. For an example showing how to use a pretrained network for feature extraction, see Extract Image Features Using Pretrained Network.
You can fine-tune deeper layers in the network by training the network on your new data set with the pretrained network as a starting point. Fine-tuning a network with transfer learning is often faster and easier than constructing and training a new network. The network has already learned a rich set of image features, but when you fine-tune the network it can learn features specific to your new data set. If you have a very large data set, then transfer learning might not be faster than training from scratch.
Tip
Fine-tuning a network often gives the highest accuracy. For very small data sets (fewer than about 20 images per class), try feature extraction instead.
Fine-tuning a network is slower and requires more effort than simple feature extraction, but since the network can learn to extract a different set of features, the final network is often more accurate. Fine-tuning usually works better than feature extraction as long as the new data set is not very small, because then the network has data to learn new features from. For examples showing how to perform transfer learning, see Transfer Learning with Deep Network Designer and Train Deep Learning Network to Classify New Images.
You can import networks and network architectures from TensorFlow®-Keras, Caffe, and the ONNX™ (Open Neural Network Exchange) model format. You can also export trained networks to the ONNX model format.
Import pretrained networks from TensorFlow-Keras by using importKerasNetwork
. You can import the network and weights either
from the same HDF5 (.h5) file or separate HDF5 and JSON (.json) files. For more
information, see importKerasNetwork
.
Import network architectures from TensorFlow-Keras by using importKerasLayers
. You can import the network architecture, either
with or without weights. You can import the network architecture and weights either
from the same HDF5 (.h5) file or separate HDF5 and JSON (.json) files. For more
information, see importKerasLayers
.
Import pretrained networks from Caffe by using the importCaffeNetwork
function. There are many pretrained networks
available in Caffe Model Zoo [5].
Download the desired .prototxt
and .caffemodel
files and use importCaffeNetwork
to import the pretrained
network into MATLAB®. For more information, see importCaffeNetwork
.
You can import network architectures of Caffe networks. Download the desired
.prototxt
file and use importCaffeLayers
to import the network layers into MATLAB. For more information, see importCaffeLayers
.
By using ONNX as an intermediate format, you can interoperate with other deep learning frameworks that support ONNX model export or import, such as TensorFlow, PyTorch, Caffe2, Microsoft® Cognitive Toolkit (CNTK), Core ML, and Apache MXNet™.
Export a trained Deep Learning Toolbox network to the ONNX model format by using the exportONNXNetwork
function. You can then import the ONNX model to other deep learning frameworks that support ONXX model
import.
Import pretrained networks from ONNX using importONNXNetwork
and import network architectures with or without
weights using importONNXLayers
.
Use pretrained networks for audio and speech processing applications by using Deep Learning Toolbox together with Audio Toolbox™.
Audio Toolbox provides the pretrained VGGish and YAMNet networks. Use the vggish
(Audio Toolbox) and
yamnet
(Audio Toolbox) functions to interact
directly with the pretrained networks. The classifySound
(Audio Toolbox) function performs required preprocessing and
postprocessing for YAMNet so that you can locate and classify sounds into one of 521
categories. You can explore the YAMNet ontology using the yamnetGraph
(Audio Toolbox) function. The
vggishFeatures
(Audio Toolbox) function performs necessary preprocessing and
postprocessing for VGGish so that you can extract feature embeddings to input to machine
learning and deep learning systems. For more information on using deep learning for
audio applications, see Introduction to Deep Learning for Audio Applications (Audio Toolbox).
Use VGGish and YAMNet to perform transfer learning and feature extraction. For example, see Transfer Learning with Pretrained Audio Networks (Audio Toolbox).
[1] ImageNet. http://www.image-net.org
[2] Russakovsky, O., Deng, J., Su, H., et al. “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision (IJCV). Vol 115, Issue 3, 2015, pp. 211–252
[3] Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Antonio Torralba, and Aude Oliva. "Places: An image database for deep scene understanding." arXiv preprint arXiv:1610.02055 (2016).
[4] Places. http://places2.csail.mit.edu/
[5] Caffe Model Zoo. http://caffe.berkeleyvision.org/model_zoo.html
alexnet
| darknet19
| darknet53
| Deep Network
Designer | densenet201
| exportONNXNetwork
| googlenet
| importCaffeLayers
| importCaffeNetwork
| importKerasLayers
| importKerasNetwork
| importONNXLayers
| importONNXNetwork
| inceptionresnetv2
| inceptionv3
| mobilenetv2
| nasnetlarge
| nasnetmobile
| resnet101
| resnet18
| resnet50
| shufflenet
| squeezenet
| vgg16
| vgg19
| xception