Markov decision process

A Markov decision process (MDP) is a mathematical framework used for modeling decision-making situations where outcomes are partly random and partly under the control of a decision-maker. It consists of states, actions, transition probabilities, and rewards. The decision-maker chooses actions based on the current state, which leads to new states and associated rewards. MDPs are widely used in various fields, including robotics, economics, and artificial intelligence. They help in finding optimal policies, which are strategies that specify the best action to take in each state to maximize cumulative rewards over time.