Acquiring Image and Skeletal Data Using Kinect

In Detecting the Kinect Devices, you could see that the two sensors on the Kinect^® for Windows^® are represented by two device IDs, one for the color sensor and one of the depth sensor. In that example, Device 1 is the color sensor and Device 2 is the depth sensor. This example shows how to create a videoinput object for the color sensor to acquire RGB images and then for the depth sensor to acquire skeletal data.

Create the videoinput object for the color sensor. DeviceID 1 is used for the color sensor.
```
vid = videoinput('kinect',1,'RGB_640x480');
```

Look at the device-specific properties on the source device, which is the color sensor on the Kinect camera.

src = getselectedsource(vid);

src

Display Summary for Video Source Object:
 
      General Settings:
        Parent = [1x1 videoinput]
        Selected = on
        SourceName = ColorSource
        Tag = 
        Type = videosource
 
      Device Specific Properties:
        Accelerometer = [0.0 -1.0 0.0]
        AutoExposure = on
        AutoWhiteBalance = on
        BacklightCompensation = AverageBrightness
        Brightness = 0.2156
        CameraElevationAngle = 3
        Contrast = 1
        ExposureTime = 1.0
        FrameInterval = 0
        FrameRate = 30
        Gain = 0
        Gamma = 2.2
        Hue = 0
        PowerLineFrequency = Disabled
        Saturation = 1
        Sharpness = 0.5
        WhiteBalance = 2700

As you can see in the output, the color sensor has a set of device-specific properties.

Device-Specific Property – Color Sensor	Description
`Accelerometer`	Returns 3D vector of acceleration data for both the color and depth sensors. The data is updated while the device is running or previewing. This 1 x 3 double represents the `x`, `y`, and `z` values of acceleration in gravity units `g` (`9.81m/s^2`). For example, `[0.06 -1.00 -0.09]` represents values of `x` as `0.06` g, `y` as `-1.00` g, and `z` as `-0.09` g.
`AutoExposure`	Use to set the exposure automatically. This control whether other related properties are activated. Values are `on` (default) and `off`. `on` means that exposure is set automatically, and these properties are not able to be set and will throw a warning: `FrameInterval`, `ExposureTime`, and `Gain`. `off` means that these properties are not able to be set and will throw a warning: `PowerLineFrequency`, `BacklightCompensation`, and `Brightness`.
`AutoWhiteBalance`	Use to enable or disable automatic white balance setting. `on` (default) means that it will automatically configure white balance and the `WhiteBalance` property cannot be set. `off` means that the `WhiteBalance` property is settable.
`BacklightCompensation`	Configures backlight compensation modes to adjust the camera to capture images dependent on environmental conditions. Note that this property is only valid if `AutoExposure` is set to `Enabled`. The default is `AverageBrightness`. Values are: `AverageBrightness` favors an average brightness level `CenterPriority` favors the center of the scene `LowLightsPriority` favors a low light level `CenterOnly` favors the center only
`Brightness`	Indicates the brightness level. The value range is `0.0` to `1.0`, and the default value is `0.2156`. Note that this property is only valid if `AutoExposure` is set to `Enabled`.
`CameraElevationAngle`	Controls the angle of the sensor lens. This is the camera angle relative to the ground. The value must be an integer property with range of -27 to 27 degrees. The default value is the last set value, since this is a sticky setting. Only set it if you want to change the angle of the camera. This property is shared with the depth sensor also.
`Contrast`	Indicates contrast level. Values must be in the range `0.5` to `2`, with a default value of `1`.
`ExposureTime`	Indicates the exposure time in increments of 1/10,000 of a second. The value range is `0` to `4000`, and the default is `0`. Note that this property is only valid if `AutoExposure` is set to `Disabled`.
`FrameInterval`	Indicates the frame interval in units of 1/10,000 of a second. The value range is `0` to `4000`, and the default is `0`. Note that this property is only valid if `AutoExposure` is set to `Disabled`.
`FrameRate`	Frames per second for the acquisition. This property is read only and the possible values for the color sensor are `12`, `15`, and `30` (default). It reflects the actual frame rate when running.
`Gain`	Indicates a multiplier for the RGB color values. The value range is `1.0` to `16.0`, and the default is `1.0`. Note that this property is only valid if `AutoExposure` is set to `Disabled`.
`Gamma`	Indicates gamma measurement. Values must be in the range `1` to `2.8`, with a default value of `2.2`.
`Hue`	Indicates hue setting. Values must be in the range `-22` to `22`, with a default value of `0`.
`PowerLineFrequency`	Option for reducing flicker caused by the frequency of a power line. Values are `Disabled`, `FiftyHertz`, and `SixtyHertz`. The default is `Disabled`. Note that this property is only valid if `AutoExposure` is set to `Enabled`.
`Saturation`	Indicates saturation level. Values must be in the range `0` to `2`, with a default value of `1`.
`Sharpness`	Indicates sharpness level. Values must be in the range `0` to `1`, with a default value of `0.5`.
`WhiteBalance`	Indicates color temperature in degrees Kelvin. The value range is `2700` to `6500` and the default is `2700`. Note that this property is only valid if `AutoWhiteBalance` is set to `Disabled`.

You can optionally set some of these properties shown in the previous step. For example, you might be acquiring images in a low light situation. You could adjust the acquisition for this by setting the BacklightCompensation property to LowLightsPriority, which favors a low light level.
```
src.BacklightCompensation = 'LowLightsPriority';
```
Preview the color stream by calling preview on the color sensor object created in step 1.
```
preview(vid);
```
When you are done previewing, close the preview window.
```
closepreview(vid);
```
Create the videoinput object for the depth sensor. Note that a second object is created (vid2), and DeviceID 2 is used for the depth sensor.
```
vid2 = videoinput('kinect',2,'Depth_640x480');
```

Look at the device-specific properties on the source device, which is the depth sensor on the Kinect.

src = getselectedsource(vid2);

src

Display Summary for Video Source Object:
 
      General Settings:
        Parent = [1x1 videoinput]
        Selected = on
        SourceName = DepthSource
        Tag = 
        Type = videosource
 
      Device Specific Properties:
        Accelerometer = [0.0 -1.0 0.0]
        BodyPosture = Standing
        CameraElevationAngle = 4
        DepthMode = Default
        FrameRate = 30
        IREmitter = on        
        SkeletonsToTrack = [1x0 double]
        TrackingMode = off

As you can see in the output, the depth sensor has a set of device-specific properties associated with skeletal tracking. These properties are specific to the depth sensor.

Device-Specific Property – Depth Sensor	Description
`Accelerometer`	Returns 3D vector of acceleration data for both the color and depth sensors. The data is updated while the device is running or previewing. This 1 x 3 double represents the `x`, `y`, and `z` values of acceleration in gravity units `g` (`9.81m/s^2`). For example, `[0.06 -1.00 -0.09]` represents values of `x` as `0.06` g, `y` as `-1.00` g, and `z` as `-0.09` g.
`BodyPosture`	Indicates whether the tracked skeletons are standing or sitting. Values are `Standing` (gives 20 point skeleton data) and `Seated` (gives 10 point skeleton data, using joint indices 2 - 11). `Standing` is the default. Note that if `BodyPosture` is set to `Seated` mode, and `TrackingMode` is set to `Position`, no position is returned, since `Position` is the location of the hip joint and the hip joint is not tracked in `Seated` mode. See the subsection “BodyPosture Joint Indices” at the end of this example for the list of indices of the 20 skeletal joints.
`CameraElevationAngle`	Controls the angle of the sensor lens. This is the camera angle relative to the ground. The value must be an integer property with range of -27 to 27 degrees. The default value is the last set value, since this is a sticky setting. Only set it if you want to change the angle of the camera. This property is shared with the color sensor also.
`DepthMode`	Indicates the range of depth in the depth map. Values are `Default` (range of 50 to 400 cm) and `Near` (range of 40 to 300 cm).
`FrameRate`	Frames per second for the acquisition. This property is read only and is fixed at `30` for the depth sensor for all formats. It reflects the actual frame rate when running.
`IREmitter`	Controls whether the IR emitter is on or off. Values are `on` and `off`. Initially, the default value is `on`. However, this is a sticky property, so the default value is the last set value. If you set it to `off`, it will remain off in future uses until you change the setting. An advantage of this property is that it is useful when using multiple Kinect devices to avoid interference.
`SkeletonsToTrack`	Indicates the Skeleton Tracking ID returned as part of the metadata. Values are: `[]` Default tracking `[TrackingID1]` Track 1 skeleton with Tracking ID = TrackingID1 `[TrackingID1 TrackingID2]` Track 2 skeletons with Tracking IDs = TrackingID1 and TrackingID2
`TrackingMode`	Indicates tracking state. Values are: `Skeleton` tracks full skeleton with joints `Position` tracks hip joint position only `Off` disables skeleton position tracking (default) Note that if `BodyPosture` is set to `Seated` mode, and `TrackingMode` is set to `Position`, no position is returned, since `Position` is the location of the hip joint and the hip joint is not tracked in `Seated` mode.

Start the second videoinput object (the depth stream).
```
start(vid2);
```

Skeletal data is accessed as metadata on the depth stream. You can use getdata to access it.

% Get the data on the object.
[frame, ts, metaData] = getdata(vid2);

% Look at the metadata to see the parameters in the skeletal data.
metaData

metaData = 
 
10x1 struct array with fields:
    AbsTime: [1x1 double]
    FrameNumber: [1x1 double]
    IsPositionTracked: [1x6 logical]
    IsSkeletonTracked: [1x6 logical] 
    JointDepthIndices: [20x2x6 double]
    JointImageIndices: [20x2x6 double]
    JointTrackingState: [20x6 double]
    JointWorldCoordinates: [20x3x6 double]
    PositionDepthIndices: [2x6 double]
    PositionImageIndices: [2x6 double]
    PositionWorldCoordinates: [3x6 double]
    RelativeFrame: [1x1 double]
    SegmentationData: [640x480 double]
    SkeletonTrackingID: [1x6 double]
    TriggerIndex: [1x1 double]

These metadata fields are related to tracking the skeletons.

MetaData	Description
`AbsTime`	This is a 1 x 1 double and represents the full timestamp, including date and time, in MATLAB clock format.
`FrameNumber`	This is a 1 x 1 double and represents the frame number.
`IsPositionTracked`	This is a 1 x 6 Boolean matrix of true/false values for the tracking of the position of each of the six skeletons. A `1` indicates the position is tracked and a `0` indicates it is not.
`IsSkeletonTracked`	This is a 1 x 6 Boolean matrix of true/false values for the tracked state of each of the six skeletons. A `1` indicates it is tracked and a `0` indicates it is not.
`JointDepthIndices`	If the `BodyPosture` property is set to `Standing`, this is a 20 x 2 x 6 double matrix of x-and y-coordinates for 20 joints in pixels relative to the depth image, for the six possible skeletons. If `BodyPosture` is set to `Seated`, this would be a 10 x 2 x 6 double for 10 joints.
`JointImageIndices`	If the `BodyPosture` property is set to `Standing`, this is a 20 x 2 x 6 double matrix of x-and y-coordinates for 20 joints in pixels relative to the color image, for the six possible skeletons. If `BodyPosture` is set to `Seated`, this would be a 10 x 2 x 6 double for 10 joints.
`JointTrackingState`	This 20 x 6 integer matrix contains enumerated values for the tracking accuracy of each joint for all six skeletons. Values include: `0` not tracked `1` position inferred `2` position tracked
`JointWorldCoordinates`	This is a 20 x 3 x 6 double matrix of x-, y- and z-coordinates for 20 joints, in meters from the sensor, for the six possible skeletons, if the `BodyPosture` is set to `Standing`. If it is set to `Seated`, this would be a 10 x 3 x 6 double for 10 joints. See step 9 for the syntax on how to see this data.
`PositionDepthIndices`	A 2 x 6 double matrix of X and Y coordinates of each skeleton in pixels relative to the depth image.
`PositionImageIndices`	A 2 x 6 double matrix of X and Y coordinates of each skeleton in pixels relative to the color image.
`PositionWorldCoordinates`	A 3 x 6 double matrix of the X, Y and Z coordinates of each skeleton in meters relative to the sensor.
`RelativeFrame`	This 1 x 1 double represents the frame number relative to the execution of a trigger if triggering is used.
`SegmentationData`	Image size double array with each pixel mapped to a tracked/detected skeleton, represented by numbers 1 to 6. This segmentation map is a bitmap with pixel values corresponding to the index of the person in the field-of-view who is closest to the camera at that pixel position. A value of 0 means there is no tracked skeleton.
`SkeletonTrackingID`	This 1 x 6 integer matrix contains the tracking IDs of all six skeletons. These IDs track specific skeletons using the `SkeletonsToTrack` property in step 5. Tracking IDs are generated by the Kinect and change from acquisition to acquisition.
`TriggerIndex`	This is a 1 x 1 double and represents the trigger the event is associated with if triggering is used.

You can look at any individual property by drilling into the metadata. For example, look at the IsSkeletonTracked property.
```
metaData.IsSkeletonTracked

ans = 
 
     1     0     0     0     0     0
```
In this case it means that of the six possible skeletons, there is one skeleton being tracked and it is in the first position. If you have multiple skeletons, this property is useful to confirm which ones are being tracked.

Get the joint locations for the first person in world coordinates using the JointWorldCoordinates property. Since this is the person in position 1, the index uses 1.

metaData.JointWorldCoordinates(:,:,1)

ans =
 
   -0.1408   -0.3257    2.1674
   -0.1408   -0.2257    2.1674
   -0.1368   -0.0098    2.2594
   -0.1324    0.1963    2.3447
   -0.3024   -0.0058    2.2574
   -0.3622   -0.3361    2.1641
   -0.3843   -0.6279    1.9877
   -0.4043   -0.6779    1.9877
    0.0301   -0.0125    2.2603
    0.2364    0.2775    2.2117
    0.3775    0.5872    2.2022
    0.4075    0.6372    2.2022
   -0.2532   -0.4392    2.0742
   -0.1869   -0.8425    1.8432
   -0.1869   -1.2941    1.8432
   -0.1969   -1.3541    1.8432
   -0.0360   -0.4436    2.0771
    0.0382   -0.8350    1.8286
    0.1096   -1.2114    1.5896
    0.1196   -1.2514    1.5896

The columns represent the X, Y, and Z coordinates in meters of the 20 points on skeleton 1.

You can optionally view the segmentation data as an image.

% View the segmentation data as an image.
imagesc(metaDataDepth.SegmentationData);
% Set the color map to jet to color code the people detected.
colormap(jet);

BodyPosture Joint Indices

The BodyPosture property, in step 5, indicates whether the tracked skeletons are standing or sitting. Values are Standing (gives 20 point skeleton data) and Seated (gives 10 point skeleton data, using joint indices 2 - 11).

This is the order of the joints returned by the Kinect adaptor:

   Hip_Center = 1;
   Spine = 2;
   Shoulder_Center = 3;
   Head = 4;
   Shoulder_Left = 5;
   Elbow_Left = 6;
   Wrist_Left = 7;
   Hand_Left = 8;
   Shoulder_Right = 9;
   Elbow_Right = 10;
   Wrist_Right = 11;
   Hand_Right = 12;
   Hip_Left = 13;
   Knee_Left = 14;
   Ankle_Left = 15;
   Foot_Left = 16; 
   Hip_Right = 17;
   Knee_Right = 18;
   Ankle_Right = 19;
   Foot_Right = 20;

When BodyPosture is set to Standing, all 20 indices are returned, as shown above. When BodyPosture is set to Seated, numbers 2 through 11 are returned, since this represents the upper body of the skeleton.

Note

To understand the differences in using the Kinect adaptor compared to previous toolbox adaptors, see Important Information About the Kinect Adaptor. For information about Kinect device discovery and the use of two device IDs, see Detecting the Kinect Devices. For an example of simultaneous acquisition, see Acquiring from Color and Depth Devices Simultaneously.

Documentation

Acquiring Image and Skeletal Data Using Kinect

Image Acquisition Toolbox Documentation

Support