This is a octave demo for contextual multi armed bandits. A contextual multi armed bandit strategy learns to select the best performing model of a finite set of models given a context. In spite I use random generated context vectors, the learning process is already visible after a low number of iterations over the same context set. In practice, the context should contain reasonable input data, e.g. certain query words entered by a user in the input field of a search engine. My octave code is based on the linUCB pseudo code proposed in “A Contextual-Bandit Approach to Personalized News Article Recommendation”, Lihong Li, Wei Chu ,Yahoo! Labs.
You can find the code in my git repository.