Reasoning, Attention, Memory (RAM) NIPS Workshop 2015

Organizers: Jason Weston, Antoine Bordes, Sumit Chopra
Duration: one day (see format below)

Facebook event page with real-time updates:

Motivation and Objective of the Workshop

In order to solve AI, a key component is the use of long term dependencies as well as short term context during inference, i.e., the interplay of reasoning, attention and memory. The machine learning community has had great success in the last decades at solving basic prediction tasks such as text classification, image annotation and speech recognition. However, solutions to deeper reasoning tasks have remained elusive. Until recently, most existing machine learning models have lacked an easy way to read and write to part of a (potentially very large) long-term memory component, and to combine this seamlessly with inference. To combine memory with reasoning, a model must learn how to access it, i.e. to perform *attention* over its memory. Within the last year or so, in part inspired by some earlier works [8, 9, 14, 15, 16, 18, 19], there has been some notable progress in these areas which this workshop addresses. Models developing notions of attention [12, 5, 6, 7, 20, 21] have shown positive results on a number of real-world tasks such as machine translation and image captioning. There has also been a surge in building models of computation which explore differing forms of explicit storage [1, 10, 11, 13, 17]. For example, recently it was shown how to learn a model to sort a small set of numbers [1] as well as a host of other symbolic manipulation tasks. Another promising direction is work employing a large long-term memory for reading comprehension; the capability of somewhat deeper reasoning has been shown on synthetic data [2], and promising results are starting to appear on real data [3,4].

In spite of this resurgence, the research into developing learning algorithms combining these components and the analysis of those algorithms is still in its infancy. The purpose of this workshop is to bring together researchers from diverse backgrounds to exchange ideas which could lead to addressing the various drawbacks associated with such models leading to more interesting models in the quest for moving towards true AI. We thus plan to focus on addressing the following issues:

The workshop will devote most of the time in invited speaker talks and contributed talks. In order to move away from a mini-conference effect we will not have any posters. To encourage interaction a webpage will be employed for realtime updates, also allowing people to post questions before or during the workshop, which will be asked at the end of talks, or can be answered online.

Key Dates

Paper Submission Instructions

Authors are encouraged to submit papers on topics related to reasoning, memory and attention, strictly adhering to the following guidelines:


[1] Neural Turing Machines. Alex Graves, Greg Wayne, Ivo Danihelka. arXiv Pre-Print, 2014
[2] Memory Networks. Jason Weston, Sumit Chopra, Antoine Bordes. International Conference on Representation Learning, 2015
[3] Teaching Machines to Read and Comprehend. Karl Moritz Hermann et. al. arXiv Pre-Print, 2015.
[4] Large-scale Simple Question Answering with Memory Networks. Antoine Bordes, Nicolas Usunier, Sumit Chopra, Jason Weston. arXiv Pre-Print, 2015.
[5] Neural Machine Translation by Jointly Learning to Align and Translate. D. Bahdanau, K. Cho, Y. Bengio; International Conference on Representation Learning 2015.
[6] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu et. al.. arXiv Pre-Print, 2015.
[7] Attention-Based Models for Speech Recognition. Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio. arXiv Pre-Print, 2015.
[8] Learning context-free grammars: Capabilities and limitations of a recurrent neural network with an external stack memory. S. Das, C. L. Giles, and G. Z. Sun. In ACCSS, 1992.
[9] Neural Net Architectures for Temporal Sequence Processing. Michael C Mozer. In Santa Fe Institute Studies in The Sciences of Complexity, volume 15.
[10] Inferring Algorithmic Patterns with Stack Augmented Recurrent Nets. Armand Joulin and Tomas Mikolov. arXiv Pre-Print, 2015.
[11] Reinforcement Learning Turing Machine. Wojciech Zaremba and Ilya Sutskever. arXiv Pre-Print, 2015.
[12] Generating sequences with recurrent neural networks. Alex Graves. arXiv preprint, 2013.
[13] End-To-End Memory Networks. S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus. arXiv Pre-Print, 2015.
[14] Long short-term memory. Sepp Hochreiter, Jürgen Schmidhuber. Neural computation, 9(8): 1735-1780, 1997.
[15] Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Jürgen Schmidhuber. Neural Computation, 4(1):131-139, 1992.
[16] A self-referential weight matrix. Jürgen Schmidhuber. In ICANN93, pp. 446-450. Springer, 1993.
[17] Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. Kumar et al. arXiv Pre-Print, 2015
[18] Learning to combine foveal glimpses with a third-order boltzmann machine. Hugo Larochelle and Geoffrey E. Hinton. In NIPS, pp. 1243-1251, 2010.
[19] Learning where to attend with deep architectures for image tracking. Denil et. al. Neural Computation, 2012.
[20] Recurrent models of visual attention. V. Mnih, N. Hees, A. Graves and K. Kavukcuoglu. In NIPS, 2014.
[21] A Neural Attention Model for Abstractive Sentence Summarization. A. M. Rush, S. Chopra and J. Weston. EMNLP 2015.

Workshop Schedule

8:20 - 8:30 Introduction [slides]

8:30 - 10:00 Session 1

- Invited talk (35min): “How to learn an algorithm” Juergen Schmidhuber, IDSIA. [slides]

- Invited talk (35min): "From Attention to Memory and towards Longer-Term Dependencies" Yoshua Bengio, University of Montreal. [slides]

- Contributed talk (20min): “Generating Images from Captions with Attention” Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov (University of Toronto). [slides]

10:00 - 10:30 Coffee break

10:30 - 12:30 Session 2

- Invited talk (35min): "Smooth Operators: the Rise of Differentiable Attention in Deep Learning" Alex Graves, Google Deepmind. [slides]

- Invited talk (35min): “Exploiting cognitive constraints to improve machine-learning memory models” Mike Mozer, University of Colorado. [slides]

- Contributed talk (20min): “Structured Memory for Neural Turing Machines” Wei Zhang, Yang Yu, Bowen Zhou (IBM Watson). [slides]

- Contributed talk (20min): “Towards Neural Network-based Reasoning” Baolin Peng, The Chinese University of Hong Kong; Zhengdong Lu, Noah's Ark Lab, Huawei Technologies; Hang Li, Noah's Ark Lab, Huawei Technologies; Kam-Fai Wong, The Chinese University of Hong Kong. [slides]

- Lightning talk (5min): “Learning to learn neural networks” Tom Bosc, Inria. [slides]

- Lightning talk (5min): “Evolving Neural Turing Machines” Rasmus Boll Greve, Emil Juul Jacobsen, Sebastian Risi (IT University of Copenhagen). [slides]

12-30 - 2:30 Lunch break

2:30 - 4:30 Session 3

- Invited talk (35min): "Neural Machine Translation: Progress Report and Beyond" Kyunghyun Cho, New York University. [slides]

- Invited talk (35min): “Sleep, learning and memory: optimal inference in the prefrontal cortex” Adrien Peyrache, New York University. [slides]

- Contributed talk (20min): “Dynamic Memory Networks for Natural Language Processing” Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Richard Socher (MetaMind). [slides]

- Contributed talk (20min): “Neural Models for Simple Algorithmic Games” Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus (Facebook AI Research). [slides]

- Lightning talk (5min): “Chess Q&A : Question Answering on Chess Games” Volkan Cirik, Louis-Philippe Morency, Eduard Hovy (CMU). [slides]

- Lightning talk (5min): “Considerations for Evaluating Models of Language Understanding and Reasoning” Gabriel Recchia, University of Cambridge. [slides]

4:30 - 5:00 Coffee break

5:00 - 6:30 Session 4

- Invited talk (35min): “A Roadmap towards Machine Intelligence” Tomas Mikolov, Facebook AI Research. [slides]

- Contributed talk (20min): “Learning Deep Neural Network Policies with Continuous Memory States” Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel (UC Berkeley). [slides]

- Invited talk (35min): “The Neural GPU and the Neural RAM machine” Ilya Sutskever, Google Brain.

Accepted Papers

Contributed Talks

Lightning Talks

Invited Speakers

Alex Graves, Google DeepMind
Alex Graves is a research scientist at Google DeepMind. His work focuses on developing recurrent neural networks for sequence learning, and now features prominently in areas such as speech recognition, handwriting synthesis, and generative sequence modelling. Alex has done a BSc in Theoretical Physics at Edinburgh, Part III Maths at Cambridge, a PhD in AI at IDSIA with Juergen Schmidhuber, followed by postdocs at TU-Munich and with Geoff Hinton at the University of Toronto. Most recently he has been spearheading DeepMind's development of Neural Turing Machines.

Yoshua Bengio, University of Montreal
Yoshua Bengio received a PhD in Computer Science from McGill University, Canada in 1991. After two post-doctoral years, one at M.I.T. with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun and Vladimir Vapnik, he became professor at the Department of Computer Science and Operations Research at Université de Montréal. He is the author of two books and more than 200 publications, the most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since '2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since '2006 an NSERC Industrial Chair, since '2005 his is a Senior Fellow of the Canadian Institute for Advanced Research and since 2014 he co-directs its program focused on deep learning. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the new International Conference on Learning Representations. His current interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning and representation learning, the geometry of generalization in high-dimensional spaces, manifold learning, biologically inspired learning algorithms, and challenging applications of statistical machine learning.

Ilya Sutskever, Google Brain
Ilya Sutskever received his PhD in 2012 from the University of Toronto working with Geoffrey Hinton. After completing his PhD, he cofounded DNNResearch with Geoffrey Hinton and Alex Krizhevsky which was acquired by Google. He is interested in all aspects of neural networks and their applications.

Kyunghyun Cho, New York University
Kyunghyun Cho is an assistant professor at the Department of Computer Science, Courant Institute of Mathematical Sciences and the Center for Data Science of New York University (NYU). Before joining NYU on Sep 2015, he was a postdoctoral researcher at the University of Montreal under the supervision of Prof. Yoshua Bengio after obtaining the doctorate degree at Aalto University (Finland) early 2014. His main research interest includes neural networks, generative models and their applications, especially, to natural language understanding.

Mike Mozer, University of Colorado
Michael Mozer received a Ph.D. in Cognitive Science at the University of California at San Diego in 1987. Following a postdoctoral fellowship with Geoffrey Hinton at the University of Toronto, he joined the faculty at the University of Colorado at Boulder and is presently an Professor in the Department of Computer Science and the Institute of Cognitive Science. He is secretary of the Neural Information Processing Systems Foundation and has served as chair of the Cognitive Science Society. His research involves developing computational models to help understand the mechanisms of cognition. He uses these models to build software that assists individuals in learning, remembering, and decision making.

Adrien Peyrache, New York University
After graduating in physics from ESPCI-ParisTech, Adrien Peyrache studied cognitive science in a joint MSc program at Pierre and Marie Curie University and Ecole Normale Supérieure, (Paris, France). In 2009, he completed his PhD in neuroscience at the Collège de France. His thesis focused on the neuronal substrate of sleep-dependent learning and memory. After a year of postdoctoral training at the CNRS (Gif-sur-Yvette, France) where he studied the coordination of neuronal activity during sleep, he moved four years ago to the laboratory of György Buzsaki at New York University Neuroscience Institute. Since then, he has devoted his work on leveraging the unique technical expertise in high density neuronal population recordings to characterize the self-organized mechanisms of neuronal activity in the navigation system.

Jürgen Schmidhuber, Swiss AI Lab IDSIA
Biography: Since age 15 or so, the main goal of professor Jürgen Schmidhuber (pronounce: You_again Shmidhoobuh) has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. He has pioneered self-improving general problem solvers since 1987, and Deep Learning Neural Networks (NNs) since 1991. The recurrent NNs (RNNs) developed by his research groups at the Swiss AI Lab IDSIA & USI & SUPSI & TU Munich were the first RNNs to win official international contests. They have revolutionized connected handwriting recognition, speech recognition, machine translation, optical character recognition, image caption generation, and are now in use at Google, Microsoft, IBM, Baidu, and many other companies. Founders & staff of DeepMind (sold to Google for over 600M) include 4 former PhD students from his lab. His team's Deep Learners were the first to win object detection and image segmentation contests, and achieved the world's first superhuman visual classification results, winning nine international competitions in machine learning & pattern recognition (more than any other team). They also were the first to learn control policies directly from high-dimensional sensory input using reinforcement learning. His research group also established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age's extreme form of minimal art. Since 2009 he has been member of the European Academy of Sciences and Arts. He has published 333 peer-reviewed papers, earned seven best paper/best video awards, the 2013 Helmholtz Award of the International Neural Networks Society, and the 2016 IEEE Neural Networks Pioneer Award. He is also president of NNAISENSE, which aims at building the first practical general purpose AI.

Tomas Mikolov, Facebook AI Research
Tomas Mikolov is a research scientist at Facebook AI Research team. Previously, he has been working in the Google Brain team, where he lead development of the word2vec algorithm. He finished his PhD at the Brno University of Technology (Czech Republic) where he worked on recurrent neural network based language models (RNNLMs). His long term research goal is to develop intelligent machines capable of learning and natural communication with people.

Workshop Organizers

Jason Weston, Facebook AI Research (
Jason Weston is a research scientist at Facebook, NY, since Feb 2014. He earned is PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2002, he was a researcher at Biowulf technologies, New York. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning and its application to text, audio and images. Jason has published over 100 papers, including best paper awards at ICML and ECML. He was also part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery.

Antoine Bordes, Facebook AI Research (
Antoine Bordes is a staff research scientist at Facebook Artificial Intelligence Research. Prior to joining Facebook in 2014, he was a CNRS staff researcher in the Heudiasyc laboratory of the University of Technology of Compiegne in France. In 2010, he was a postdoctoral fellow in Yoshua Bengio's lab of University of Montreal. He received his PhD in machine learning from Pierre & Marie Curie University in Paris in early 2010. From 2004 to 2009, he collaborated regularly with Léon Bottou at NEC Labs of America in Princeton. He received two awards for best PhD from the French Association for Artificial Intelligence and from the French Armament Agency, as well as a Scientific Excellence Scholarship awarded by CNRS in 2013. Antoine's current interests cover knowledge bases/graphs modeling, natural language processing, deep learning and large scale learning.

Sumit Chopra, Facebook AI Research
Sumit Chopra is a research scientist at the Facebook Artificial Intelligence Research Lab. He graduated with a Ph.D., in computer science from New York University in 2008. His thesis proposed a first of its kind neural network model for doing relational regression, and was a conceptual foundation for a startup company for modeling residential real estate prices. Following his Ph.D., Sumit joined AT&T Labs - Research as a research scientist in the Statistics and Machine Learning Department, where he focused on building novel deep learning models for speech recognition, natural language processing, and computer vision. While at AT&T he also worked on other areas of machine learning, such as, recommender systems, computational advertisement, and ranking. He has been a research scientist at Facebook AI Research since April 2014, where he has been focusing primarily on natural language understanding.

Related Workshops

There have been a series of "Learning Semantics" workshops over the last years which touch upon these subjects, but our workshop is more focused, which we hope will generate greater interaction and discussion. Similarly, there have been a series of deep learning workshops over the years e.g. last year with the title "Deep Learning and Representation Learning". Deep Learning is very broad and this year is the subject of a symposium. Our workshop focuses on a smaller area that has gained substantial interest (see references above).

Theme Song

Paul McCartney's Ram on!, Remixed by The Sperm Whale.