CMAB
An OpenAI Gym environment implementing a Contextual Multi-Armed Bandit (CMAB).
The environment defines a number of latent states. On each step, a state is randomly sampled. The agent observes a binary feature vector, where the feature values are probabilistically determined by the latent state. A parameter controls the probability that each feature is predictive of the state.
On each step the agent selects from a number of alternatives (arms). The mean reward associated with a given arm in a particular latent state is drawn from a normal distribution with zero mean and unit variance. The reward associated with an arm on a given step is sampled from a normal distribution with the mean associated with that arm in that state and unit variance.
Installation
To install, execute the following commands:
git clone https://github.rpi.edu/AdaCog/CMAB.git
cd CMAB
sudo pip3 install -e .