# Suggested further readings

### Overview

Sutton, R. S., and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

### Links to neuroscience
Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275(5306): 1593-1599. doi: [10.1126/science.275.5306.1593](https://doi.org/10.1126/science.275.5306.1593) {{ closed_access }} (preprint: [cs.utexas.edu/~dana/Reward.pdf](http://www.cs.utexas.edu/~dana/Reward.pdf) {{ open_access }}).

Daw, N. D., Niv, Y., and Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience 8(12): 1704-1711. doi: [10.1038/nn1560](https://doi.org/10.1038/nn1560) {{ closed_access }}.

Dayan, P., and Niv, Y. (2008). Reinforcement learning: the good, the bad and the ugly. Current opinion in neurobiology 18(2): 185-196. doi: [10.1016/j.conb.2008.08.003](https://doi.org/10.1016/j.conb.2008.08.003) {{ closed_access }}.

Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., ... and Botvinick, M. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature neuroscience 21(6): 860-868. doi: [10.1038/s41593-018-0147-8](https://doi.org/10.1038/s41593-018-0147-8) {{ closed_access }} (preprint: biorXiv doi: [10.1101/295964](https://doi.org/10.1101/295964) {{ open_access }}).

Mattar, M. G., and Daw, N. D. (2018). Prioritized memory access explains planning and hippocampal replay. Nature neuroscience 21(11): 1609-1617. doi: [10.1038/s41593-018-0232-z](https://doi.org/10.1038/s41593-018-0232-z) {{ closed_access }} (postprint: [europepmc.org/articles/pmc6203620](https://europepmc.org/articles/pmc6203620) {{ open_access }}).

### State of the art
Dabney, W., Kurth-Nelson, Z., Uchida, N., Starkweather, C. K., Hassabis, D., Munos, R., and Botvinick, M. (2020). A distributional code for value in dopamine-based reinforcement learning. Nature 577(7792): 671-675. doi: [10.1038/s41586-019-1924-6](https://doi.org/10.1038/s41586-019-1924-6) {{ closed_access }} (postprint: [europepmc.org/articles/pmc7476215](https://europepmc.org/articles/pmc7476215) {{ open_access }}).

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature 518(7540): 529-533. doi: [10.1038/nature14236](https://doi.org/10.1038/nature14236) {{ closed_access }}.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature 529(7587): 484-489. doi: [10.1038/nature16961](https://doi.org/10.1038/nature16961) {{ closed_access }}.