You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
472 lines
22 KiB
472 lines
22 KiB
<?xml version="1.0"?>
|
|
<doc>
|
|
<assembly>
|
|
<name>AForge.MachineLearning</name>
|
|
</assembly>
|
|
<members>
|
|
<member name="T:AForge.MachineLearning.BoltzmannExploration">
|
|
<summary>
|
|
Boltzmann distribution exploration policy.
|
|
</summary>
|
|
|
|
<remarks><para>The class implements exploration policy base on Boltzmann distribution.
|
|
Acording to the policy, action <b>a</b> at state <b>s</b> is selected with the next probability:</para>
|
|
<code lang="none">
|
|
exp( Q( s, a ) / t )
|
|
p( s, a ) = -----------------------------
|
|
SUM( exp( Q( s, b ) / t ) )
|
|
b
|
|
</code>
|
|
<para>where <b>Q(s, a)</b> is action's <b>a</b> estimation (usefulness) at state <b>s</b> and
|
|
<b>t</b> is <see cref="P:AForge.MachineLearning.BoltzmannExploration.Temperature"/>.</para>
|
|
</remarks>
|
|
|
|
<seealso cref="T:AForge.MachineLearning.RouletteWheelExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.EpsilonGreedyExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.TabuSearchExploration"/>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.BoltzmannExploration.Temperature">
|
|
<summary>
|
|
Termperature parameter of Boltzmann distribution, >0.
|
|
</summary>
|
|
|
|
<remarks><para>The property sets the balance between exploration and greedy actions.
|
|
If temperature is low, then the policy tends to be more greedy.</para></remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.BoltzmannExploration.#ctor(System.Double)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.BoltzmannExploration"/> class.
|
|
</summary>
|
|
|
|
<param name="temperature">Termperature parameter of Boltzmann distribution.</param>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.BoltzmannExploration.ChooseAction(System.Double[])">
|
|
<summary>
|
|
Choose an action.
|
|
</summary>
|
|
|
|
<param name="actionEstimates">Action estimates.</param>
|
|
|
|
<returns>Returns selected action.</returns>
|
|
|
|
<remarks>The method chooses an action depending on the provided estimates. The
|
|
estimates can be any sort of estimate, which values usefulness of the action
|
|
(expected summary reward, discounted reward, etc).</remarks>
|
|
|
|
</member>
|
|
<member name="T:AForge.MachineLearning.EpsilonGreedyExploration">
|
|
<summary>
|
|
Epsilon greedy exploration policy.
|
|
</summary>
|
|
|
|
<remarks><para>The class implements epsilon greedy exploration policy. Acording to the policy,
|
|
the best action is chosen with probability <b>1-epsilon</b>. Otherwise,
|
|
with probability <b>epsilon</b>, any other action, except the best one, is
|
|
chosen randomly.</para>
|
|
|
|
<para>According to the policy, the epsilon value is known also as exploration rate.</para>
|
|
</remarks>
|
|
|
|
<seealso cref="T:AForge.MachineLearning.RouletteWheelExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.BoltzmannExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.TabuSearchExploration"/>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.EpsilonGreedyExploration.Epsilon">
|
|
<summary>
|
|
Epsilon value (exploration rate), [0, 1].
|
|
</summary>
|
|
|
|
<remarks><para>The value determines the amount of exploration driven by the policy.
|
|
If the value is high, then the policy drives more to exploration - choosing random
|
|
action, which excludes the best one. If the value is low, then the policy is more
|
|
greedy - choosing the beat so far action.
|
|
</para></remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.EpsilonGreedyExploration.#ctor(System.Double)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.EpsilonGreedyExploration"/> class.
|
|
</summary>
|
|
|
|
<param name="epsilon">Epsilon value (exploration rate).</param>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.EpsilonGreedyExploration.ChooseAction(System.Double[])">
|
|
<summary>
|
|
Choose an action.
|
|
</summary>
|
|
|
|
<param name="actionEstimates">Action estimates.</param>
|
|
|
|
<returns>Returns selected action.</returns>
|
|
|
|
<remarks>The method chooses an action depending on the provided estimates. The
|
|
estimates can be any sort of estimate, which values usefulness of the action
|
|
(expected summary reward, discounted reward, etc).</remarks>
|
|
|
|
</member>
|
|
<member name="T:AForge.MachineLearning.IExplorationPolicy">
|
|
<summary>
|
|
Exploration policy interface.
|
|
</summary>
|
|
|
|
<remarks>The interface describes exploration policies, which are used in Reinforcement
|
|
Learning to explore state space.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.IExplorationPolicy.ChooseAction(System.Double[])">
|
|
<summary>
|
|
Choose an action.
|
|
</summary>
|
|
|
|
<param name="actionEstimates">Action estimates.</param>
|
|
|
|
<returns>Returns selected action.</returns>
|
|
|
|
<remarks>The method chooses an action depending on the provided estimates. The
|
|
estimates can be any sort of estimate, which values usefulness of the action
|
|
(expected summary reward, discounted reward, etc).</remarks>
|
|
|
|
</member>
|
|
<member name="T:AForge.MachineLearning.RouletteWheelExploration">
|
|
<summary>
|
|
Roulette wheel exploration policy.
|
|
</summary>
|
|
|
|
<remarks><para>The class implements roulette whell exploration policy. Acording to the policy,
|
|
action <b>a</b> at state <b>s</b> is selected with the next probability:</para>
|
|
<code lang="none">
|
|
Q( s, a )
|
|
p( s, a ) = ------------------
|
|
SUM( Q( s, b ) )
|
|
b
|
|
</code>
|
|
<para>where <b>Q(s, a)</b> is action's <b>a</b> estimation (usefulness) at state <b>s</b>.</para>
|
|
|
|
<para><note>The exploration policy may be applied only in cases, when action estimates (usefulness)
|
|
are represented with positive value greater then 0.</note></para>
|
|
</remarks>
|
|
|
|
<seealso cref="T:AForge.MachineLearning.BoltzmannExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.EpsilonGreedyExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.TabuSearchExploration"/>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.RouletteWheelExploration.#ctor">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.RouletteWheelExploration"/> class.
|
|
</summary>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.RouletteWheelExploration.ChooseAction(System.Double[])">
|
|
<summary>
|
|
Choose an action.
|
|
</summary>
|
|
|
|
<param name="actionEstimates">Action estimates.</param>
|
|
|
|
<returns>Returns selected action.</returns>
|
|
|
|
<remarks>The method chooses an action depending on the provided estimates. The
|
|
estimates can be any sort of estimate, which values usefulness of the action
|
|
(expected summary reward, discounted reward, etc).</remarks>
|
|
|
|
</member>
|
|
<member name="T:AForge.MachineLearning.TabuSearchExploration">
|
|
<summary>
|
|
Tabu search exploration policy.
|
|
</summary>
|
|
|
|
<remarks>The class implements simple tabu search exploration policy,
|
|
allowing to set certain actions as tabu for a specified amount of
|
|
iterations. The actual exploration and choosing from non-tabu actions
|
|
is done by <see cref="P:AForge.MachineLearning.TabuSearchExploration.BasePolicy">base exploration policy</see>.</remarks>
|
|
|
|
<seealso cref="T:AForge.MachineLearning.BoltzmannExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.EpsilonGreedyExploration"/>
|
|
<seealso cref="T:AForge.MachineLearning.RouletteWheelExploration"/>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.TabuSearchExploration.BasePolicy">
|
|
<summary>
|
|
Base exploration policy.
|
|
</summary>
|
|
|
|
<remarks>Base exploration policy is the policy, which is used
|
|
to choose from non-tabu actions.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.TabuSearchExploration.#ctor(System.Int32,AForge.MachineLearning.IExplorationPolicy)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.TabuSearchExploration"/> class.
|
|
</summary>
|
|
|
|
<param name="actions">Total actions count.</param>
|
|
<param name="basePolicy">Base exploration policy.</param>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.TabuSearchExploration.ChooseAction(System.Double[])">
|
|
<summary>
|
|
Choose an action.
|
|
</summary>
|
|
|
|
<param name="actionEstimates">Action estimates.</param>
|
|
|
|
<returns>Returns selected action.</returns>
|
|
|
|
<remarks>The method chooses an action depending on the provided estimates. The
|
|
estimates can be any sort of estimate, which values usefulness of the action
|
|
(expected summary reward, discounted reward, etc). The action is choosed from
|
|
non-tabu actions only.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.TabuSearchExploration.ResetTabuList">
|
|
<summary>
|
|
Reset tabu list.
|
|
</summary>
|
|
|
|
<remarks>Clears tabu list making all actions allowed.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.TabuSearchExploration.SetTabuAction(System.Int32,System.Int32)">
|
|
<summary>
|
|
Set tabu action.
|
|
</summary>
|
|
|
|
<param name="action">Action to set tabu for.</param>
|
|
<param name="tabuTime">Tabu time in iterations.</param>
|
|
|
|
</member>
|
|
<member name="T:AForge.MachineLearning.QLearning">
|
|
<summary>
|
|
QLearning learning algorithm.
|
|
</summary>
|
|
|
|
<remarks>The class provides implementation of Q-Learning algorithm, known as
|
|
off-policy Temporal Difference control.</remarks>
|
|
|
|
<seealso cref="T:AForge.MachineLearning.Sarsa"/>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.QLearning.StatesCount">
|
|
<summary>
|
|
Amount of possible states.
|
|
</summary>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.QLearning.ActionsCount">
|
|
<summary>
|
|
Amount of possible actions.
|
|
</summary>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.QLearning.ExplorationPolicy">
|
|
<summary>
|
|
Exploration policy.
|
|
</summary>
|
|
|
|
<remarks>Policy, which is used to select actions.</remarks>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.QLearning.LearningRate">
|
|
<summary>
|
|
Learning rate, [0, 1].
|
|
</summary>
|
|
|
|
<remarks>The value determines the amount of updates Q-function receives
|
|
during learning. The greater the value, the more updates the function receives.
|
|
The lower the value, the less updates it receives.</remarks>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.QLearning.DiscountFactor">
|
|
<summary>
|
|
Discount factor, [0, 1].
|
|
</summary>
|
|
|
|
<remarks>Discount factor for the expected summary reward. The value serves as
|
|
multiplier for the expected reward. So if the value is set to 1,
|
|
then the expected summary reward is not discounted. If the value is getting
|
|
smaller, then smaller amount of the expected reward is used for actions'
|
|
estimates update.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.QLearning.#ctor(System.Int32,System.Int32,AForge.MachineLearning.IExplorationPolicy)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.QLearning"/> class.
|
|
</summary>
|
|
|
|
<param name="states">Amount of possible states.</param>
|
|
<param name="actions">Amount of possible actions.</param>
|
|
<param name="explorationPolicy">Exploration policy.</param>
|
|
|
|
<remarks>Action estimates are randomized in the case of this constructor
|
|
is used.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.QLearning.#ctor(System.Int32,System.Int32,AForge.MachineLearning.IExplorationPolicy,System.Boolean)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.QLearning"/> class.
|
|
</summary>
|
|
|
|
<param name="states">Amount of possible states.</param>
|
|
<param name="actions">Amount of possible actions.</param>
|
|
<param name="explorationPolicy">Exploration policy.</param>
|
|
<param name="randomize">Randomize action estimates or not.</param>
|
|
|
|
<remarks>The <b>randomize</b> parameter specifies if initial action estimates should be randomized
|
|
with small values or not. Randomization of action values may be useful, when greedy exploration
|
|
policies are used. In this case randomization ensures that actions of the same type are not chosen always.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.QLearning.GetAction(System.Int32)">
|
|
<summary>
|
|
Get next action from the specified state.
|
|
</summary>
|
|
|
|
<param name="state">Current state to get an action for.</param>
|
|
|
|
<returns>Returns the action for the state.</returns>
|
|
|
|
<remarks>The method returns an action according to current
|
|
<see cref="P:AForge.MachineLearning.QLearning.ExplorationPolicy">exploration policy</see>.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.QLearning.UpdateState(System.Int32,System.Int32,System.Double,System.Int32)">
|
|
<summary>
|
|
Update Q-function's value for the previous state-action pair.
|
|
</summary>
|
|
|
|
<param name="previousState">Previous state.</param>
|
|
<param name="action">Action, which leads from previous to the next state.</param>
|
|
<param name="reward">Reward value, received by taking specified action from previous state.</param>
|
|
<param name="nextState">Next state.</param>
|
|
|
|
</member>
|
|
<member name="T:AForge.MachineLearning.Sarsa">
|
|
<summary>
|
|
Sarsa learning algorithm.
|
|
</summary>
|
|
|
|
<remarks>The class provides implementation of Sarse algorithm, known as
|
|
on-policy Temporal Difference control.</remarks>
|
|
|
|
<seealso cref="T:AForge.MachineLearning.QLearning"/>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.Sarsa.StatesCount">
|
|
<summary>
|
|
Amount of possible states.
|
|
</summary>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.Sarsa.ActionsCount">
|
|
<summary>
|
|
Amount of possible actions.
|
|
</summary>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.Sarsa.ExplorationPolicy">
|
|
<summary>
|
|
Exploration policy.
|
|
</summary>
|
|
|
|
<remarks>Policy, which is used to select actions.</remarks>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.Sarsa.LearningRate">
|
|
<summary>
|
|
Learning rate, [0, 1].
|
|
</summary>
|
|
|
|
<remarks>The value determines the amount of updates Q-function receives
|
|
during learning. The greater the value, the more updates the function receives.
|
|
The lower the value, the less updates it receives.</remarks>
|
|
|
|
</member>
|
|
<member name="P:AForge.MachineLearning.Sarsa.DiscountFactor">
|
|
<summary>
|
|
Discount factor, [0, 1].
|
|
</summary>
|
|
|
|
<remarks>Discount factor for the expected summary reward. The value serves as
|
|
multiplier for the expected reward. So if the value is set to 1,
|
|
then the expected summary reward is not discounted. If the value is getting
|
|
smaller, then smaller amount of the expected reward is used for actions'
|
|
estimates update.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.Sarsa.#ctor(System.Int32,System.Int32,AForge.MachineLearning.IExplorationPolicy)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.Sarsa"/> class.
|
|
</summary>
|
|
|
|
<param name="states">Amount of possible states.</param>
|
|
<param name="actions">Amount of possible actions.</param>
|
|
<param name="explorationPolicy">Exploration policy.</param>
|
|
|
|
<remarks>Action estimates are randomized in the case of this constructor
|
|
is used.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.Sarsa.#ctor(System.Int32,System.Int32,AForge.MachineLearning.IExplorationPolicy,System.Boolean)">
|
|
<summary>
|
|
Initializes a new instance of the <see cref="T:AForge.MachineLearning.Sarsa"/> class.
|
|
</summary>
|
|
|
|
<param name="states">Amount of possible states.</param>
|
|
<param name="actions">Amount of possible actions.</param>
|
|
<param name="explorationPolicy">Exploration policy.</param>
|
|
<param name="randomize">Randomize action estimates or not.</param>
|
|
|
|
<remarks>The <b>randomize</b> parameter specifies if initial action estimates should be randomized
|
|
with small values or not. Randomization of action values may be useful, when greedy exploration
|
|
policies are used. In this case randomization ensures that actions of the same type are not chosen always.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.Sarsa.GetAction(System.Int32)">
|
|
<summary>
|
|
Get next action from the specified state.
|
|
</summary>
|
|
|
|
<param name="state">Current state to get an action for.</param>
|
|
|
|
<returns>Returns the action for the state.</returns>
|
|
|
|
<remarks>The method returns an action according to current
|
|
<see cref="P:AForge.MachineLearning.Sarsa.ExplorationPolicy">exploration policy</see>.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.Sarsa.UpdateState(System.Int32,System.Int32,System.Double,System.Int32,System.Int32)">
|
|
<summary>
|
|
Update Q-function's value for the previous state-action pair.
|
|
</summary>
|
|
|
|
<param name="previousState">Curren state.</param>
|
|
<param name="previousAction">Action, which lead from previous to the next state.</param>
|
|
<param name="reward">Reward value, received by taking specified action from previous state.</param>
|
|
<param name="nextState">Next state.</param>
|
|
<param name="nextAction">Next action.</param>
|
|
|
|
<remarks>Updates Q-function's value for the previous state-action pair in
|
|
the case if the next state is non terminal.</remarks>
|
|
|
|
</member>
|
|
<member name="M:AForge.MachineLearning.Sarsa.UpdateState(System.Int32,System.Int32,System.Double)">
|
|
<summary>
|
|
Update Q-function's value for the previous state-action pair.
|
|
</summary>
|
|
|
|
<param name="previousState">Curren state.</param>
|
|
<param name="previousAction">Action, which lead from previous to the next state.</param>
|
|
<param name="reward">Reward value, received by taking specified action from previous state.</param>
|
|
|
|
<remarks>Updates Q-function's value for the previous state-action pair in
|
|
the case if the next state is terminal.</remarks>
|
|
|
|
</member>
|
|
</members>
|
|
</doc>
|