The Bellman Equations are fundamental to the study of Markov Decision Processes (MDPs) in Reinforcement Learning (RL). Named after Richard E. Bellman, who developed these equations in the context of dynamic programming, they provide a mathematical framework for solving problems of sequential decision making under uncertainty.
Richard Bellman introduced the concept of dynamic programming in the 1950s while working at the RAND Corporation. His work was motivated by the need to optimize complex decision-making processes in various fields like economics, engineering, and operations research. The Bellman Equations emerged from his principle of optimality, which states that an optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
There are two primary forms of Bellman Equations:
The Bellman Expectation Equation for state-value function V(s) under a policy π is:
V(s) = ∑ₐ π(a|s) [R(s,a) + γ ∑ₛ' P(s'|s,a) V(s')]
Where:
The Bellman Optimality Equation for the optimal state-value function V* is:
V*(s) = maxₐ [R(s,a) + γ ∑ₛ' P(s'|s,a) V*(s')]
Bellman Equations are used in:
In RL, the Bellman Equations help in estimating the value of different states or state-action pairs, which in turn guides the learning process to find an optimal policy. They form the basis for algorithms that learn to act in complex environments by breaking down the problem into smaller, more manageable subproblems.