Consider the two-armed bandit problem with 0/1 rewards where the optimal arm is a Bernoulli distribution with mean.
a) Bayesian probability
b) Binomial distribution
c) Markov decision process
d) Normal distribution