rlACAgentOptions

Options for AC agent

Description

Use an rlACAgentOptions object to specify options for creating actor-critic (AC) agents. To create an actor-critic agent, use rlACAgent

For more information see Actor-Critic Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Syntax

opt = rlACAgentOptions

opt = rlACAgentOptions(Name,Value)

Description

opt = rlACAgentOptions creates a default option set for an AC agent. You can modify the object properties using dot notation.

example

opt = rlACAgentOptions(Name,Value) sets option properties using name-value pairs. For example, rlDQNAgentOptions('DiscountFactor',0.95) creates an option set with a discount factor of 0.95. You can specify multiple name-value pairs. Enclose each property name in quotes.

Properties

expand all

`NumStepsToLookAhead` — Number of steps ahead
`32` (default) | positive integer

Number of steps to look ahead in model training, specified as a positive integer. For AC agents, the number of steps to look ahead corresponds to the training episode length.

`EntropyLossWeight` — Entropy loss weight
`0` (default) | scalar value between `0` and `1`

Entropy loss weight, specified as a scalar value between 0 and 1, inclusive. A higher loss weight value promotes agent exploration by applying a penalty for being too certain about which action to take. Doing so can help the agent move out of local optima.

The entropy loss function for episode step t is:

$H_{t} = E \sum_{k = 1}^{M} μ_{k} (S_{t} | θ_{μ}) \ln μ_{k} (S_{t} | θ_{μ})$

Here:

E is the entropy loss weight.
M is the number of possible actions.
μ_k(S_t) is the probability of taking action A_k when in state S_t following the current policy.

When gradients are computed during training, an additional gradient component is computed for minimizing this loss function.

`SampleTime` — Sample time of agent
`1` (default) | positive scalar

Sample time of agent, specified as a positive scalar.

Within a Simulink environment, the agent gets executed every SampleTime seconds of simulation time.

Within a MATLAB environment, the agent gets executed every time the environment advances. However, SampleTime is the time interval between consecutive elements in the output experience returned by sim or train.

`DiscountFactor` — Discount factor
`0.99` (default) | positive scalar less than or equal to 1

Discount factor applied to future rewards during training, specified as a positive scalar less than or equal to 1.

Object Functions

rlACAgent Actor-critic reinforcement learning agent

Examples

collapse all

Create AC Agent Options Object

Open Live Script

Create an AC agent options object, specifying the discount factor.

opt = rlACAgentOptions('DiscountFactor',0.95)

opt = 
  rlACAgentOptions with properties:

    NumStepsToLookAhead: 32
      EntropyLossWeight: 0
             SampleTime: 1
         DiscountFactor: 0.9500

You can modify options using dot notation. For example, set the agent sample time to 0.5.

opt.SampleTime = 0.5;

Compatibility Considerations

expand all

Default value for `NumStepsToLookAhead` changed to 32

Behavior change in future release

A value of 32 for this property should work better than 1 for most environments. If you nave MATLAB^® R2020b or a later version and you want to reproduce how rlACAgent behaved on versions prior to R2020b, set this value to 1.

Documentation

rlACAgentOptions

Description

Creation

Syntax

Description

Properties

`NumStepsToLookAhead` — Number of steps ahead
`32` (default) | positive integer

`EntropyLossWeight` — Entropy loss weight
`0` (default) | scalar value between `0` and `1`

`SampleTime` — Sample time of agent
`1` (default) | positive scalar

`DiscountFactor` — Discount factor
`0.99` (default) | positive scalar less than or equal to 1

Object Functions

Examples

Create AC Agent Options Object

Compatibility Considerations

Default value for `NumStepsToLookAhead` changed to 32

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

Documentation

rlACAgentOptions

Description

Creation

Syntax

Description

Properties

NumStepsToLookAhead — Number of steps ahead 32 (default) | positive integer

EntropyLossWeight — Entropy loss weight 0 (default) | scalar value between 0 and 1

SampleTime — Sample time of agent 1 (default) | positive scalar

DiscountFactor — Discount factor 0.99 (default) | positive scalar less than or equal to 1

Object Functions

Examples

Create AC Agent Options Object

Compatibility Considerations

Default value for NumStepsToLookAhead changed to 32

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

`NumStepsToLookAhead` — Number of steps ahead
`32` (default) | positive integer

`EntropyLossWeight` — Entropy loss weight
`0` (default) | scalar value between `0` and `1`

`SampleTime` — Sample time of agent
`1` (default) | positive scalar

`DiscountFactor` — Discount factor
`0.99` (default) | positive scalar less than or equal to 1

Default value for `NumStepsToLookAhead` changed to 32