Reinforcement learning is a type of unsupervised learning that use reward as its underlying principle. Feedback of the learning come based on the action of the agent and delayed until some point of time. The data also sequential, different sequence of history can yield different action.
Reinforcement learning is built with observation, action, and reward. Concept of state is important as it becomes the key information to decide the next action. Three state can be defined in RL, environment state, agent state, and information state. Information state is Markov state where the future state is independent to the past state given the present state. The difference between these three state in an RL system can branch logic used in the policy.
An agent in RL consist of three components: policy, value function, and model. Policy is a rule to determine what action to take respective to current agent state. Value function is a function to determine the reward value given a state. And model is how the agent see the future environment state, this is useful to fit an action to the next state of the environment.
Key problems are exploration and exploitation. Exploration is trial-error process and exploitation is using known information to exploit the most reward.