jhebda
11-06-2002, 01:58 AM
I am working on coding a reinforcement learning algorithm and attempting to compare it with an optimal Q-learning algorithm using an epsilon greedy approach. The first major problem i'm seeing, though, is that I can't find information online or in reference about optimizing an alpha value in the following equation for each state.
In a World of many States, each state has a number of Q values associated with it. Assuming initial state i, and next state j, the equation to update Q is:
Q[a,i] = Q[a,i](old) + alpha * (Reward received in state i + max(Q[a,j]) - Q[a,i](old)
This basically says that the new value of Q (Q[a,i] is the utility of performing action a in state i) is equal to the old value plus a small sum scaled by an alpha value.
Alpha should ideally start high, since we know nothing about the entire world of states, and then decrease. However, I cannot find information on an optimal decrease rate for alpha. Has anybody worked with reinforcement learning and dealt with a similar problem before that could help me out?
In a World of many States, each state has a number of Q values associated with it. Assuming initial state i, and next state j, the equation to update Q is:
Q[a,i] = Q[a,i](old) + alpha * (Reward received in state i + max(Q[a,j]) - Q[a,i](old)
This basically says that the new value of Q (Q[a,i] is the utility of performing action a in state i) is equal to the old value plus a small sum scaled by an alpha value.
Alpha should ideally start high, since we know nothing about the entire world of states, and then decrease. However, I cannot find information on an optimal decrease rate for alpha. Has anybody worked with reinforcement learning and dealt with a similar problem before that could help me out?