The software provides a generic deep learning processor IP core that is target-independent and can be deployed to any custom platform that you specify. The processor can be reused and shared to accommodate deep neural networks that have various layer sizes and parameters. Use this processor to rapidly prototype deep neural networks from MATLAB®, and then deploy the network to FPGAs.
This figure shows the deep learning processor architecture.
To illustrate the deep learning processor architecture, consider an image classification example.
You can store the input images, the weights, and the output images in the external DDR
memory. The processor consists of four AXI4 Master interfaces that communicate with the
external memory. Using one of the AXI4 Master interfaces, you can load the input images onto
the Block RAM (BRAM
). The Block RAM provides the activations to the
Generic Convolution Processor
.
The Generic Convolution Processor
performs the equivalent operation
of one convolution layer. Using another AXI4 Master interface, the weights for the
convolution operation are provided to the Generic Convolution Processor
.
The Generic Convolution Processor
then performs the convolution operation
on the input image and provides the activations for the Activation
Normalization
. The processor is generic because it can support tensors and
shapes of various sizes.
Based on the neural network that you provide, the Activation
Normalization
module serves the purpose of adding the ReLU nonlinearity, a
maxpool layer, or performs Local Response Normalization (LRN). You see that the processor
has two Activation Normalization
units. One unit follows the
Generic Convolution Processor
. The other unit follows the
Generic FC Processor
.
Depending on the number of convolution layers that you have in your pretrained network,
the Conv Controller (Scheduling)
acts as ping-pong buffers. The
Generic Convolution Processor
and Activation
Normalization
can process one layer at a time. To process the next layer, the
Conv Controller (Scheduling)
moves back to the BRAM and then performs
the convolution and activation normalization operations for all convolution layers in the
network.
The Generic FC Processor
performs the equivalent operation of one
fully-connected layer (FC). Using another AXI4 Master interface, the weights for the
fully-connected layer are provided to the Generic FC Processor
. The
Generic FC Processor
then performs the fully-connected layer operation
on the input image and provides the activations for the Activation
Normalization
module. This processor is also generic because it can support
tensors and shapes of various sizes.
The FC Controller (Scheduling)
works similar to the Conv
Controller (Scheduling)
. The FC Controller (Scheduling)
coordinates with the FIFO
to act as ping-pong buffers for performing the
fully-connected layer operation and Activation Normalization
depending on
the number of FC layers, and ReLU, maxpool, or LRN features that you have in your neural
network. After the Generic FC Processor
and Activation
Normalization
modules process all the frames in
the image, the predictions or scores are transmitted through the AXI4 Master interface and
stored in the external DDR memory.
One application of the custom deep learning processor IP core is the MATLAB controlled deep learning processor. To create this processor, integrate the deep learning processor IP with the HDL Verifier™ MATLAB as AXI Master IP by using the AXI4 slave interface. Through a JTAG or PCI express interface, you can import various pretrained neural networks from MATLAB, execute the operations specified by the network in the deep learning processor IP, and return the classification results to MATLAB.
For more information, see MATLAB Controlled Deep Learning Processor.