Memory Networks for Language Understanding, ICML Tutorial 2016

Speaker: Jason Weston
Time: 11am-1pm, June 19 @ Crown Plaza Broadway + Breakout room

There has been a recent resurgence in interest in the use of the combination of reasoning, attention and memory for solving tasks, particularly in the field of language understanding. I will review some of these recent efforts, as well as focusing on one of my own groupís contributions, memory networks, an architecture that we have applied to question answering, language modeling and general dialog. As we try to move towards the goal of true language understanding, I will also discuss recent datasets and tests that have been built to assess these models abilities to see how far we have come.


Slides are here: PPT or PDF (some animation isn't quite right).


Publications on Memory Networks:

  • J. Weston, S. Chopra, A. Bordes. Memory Networks. ICLR 2015 (and arXiv:1410.3916).
  • S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus. End-To-End Memory Networks. NIPS 2015 (and arXiv:1503.08895). [code]
  • J. Weston, A. Bordes, S. Chopra, A. M. Rush, B. van Merriënboer, A. Joulin, T. Mikolov. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. arXiv:1502.05698. [data & code]
  • A. Bordes, N. Usunier, S. Chopra, J. Weston. Large-scale Simple Question Answering with Memory Networks. arXiv:1506.02075. [data]
  • J. Dodge, A. Gane, X. Zhang, A. Bordes, S. Chopra, A. Miller, A. Szlam, J. Weston. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. arXiv:1511.06931. [data]
  • F. Hill, A. Bordes, S. Chopra, J. Weston. The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations. arXiv:1511.02301. [data]
  • J. Weston. Dialog-based Language Learning. arXiv:1604.06045. [data]
  • A. Bordes, Jason Weston. Learning End-to-End Goal-Oriented Dialog. arXiv:1605.07683. [data]
  • Related Publications

    See the related RAM Workshop from NIPS 2015 and references therein, here are some of them:

    [1] Neural Machine Translation by Jointly Learning to Align and Translate. D. Bahdanau, K. Cho, Y. Bengio; International Conference on Representation Learning 2015.
    [2] Neural Turing Machines. Alex Graves, Greg Wayne, Ivo Danihelka. arXiv Pre-Print, 2014
    [3] Teaching Machines to Read and Comprehend. Karl Moritz Hermann et. al. arXiv Pre-Print, 2015.
    [4] Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. Kumar et al. arXiv Pre-Print, 2015
    [5] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu et. al.. arXiv Pre-Print, 2015.
    [6] Attention-Based Models for Speech Recognition. Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio. arXiv Pre-Print, 2015.
    [7] Learning context-free grammars: Capabilities and limitations of a recurrent neural network with an external stack memory. S. Das, C. L. Giles, and G. Z. Sun. In ACCSS, 1992.
    [8] Neural Net Architectures for Temporal Sequence Processing. Michael C Mozer. In Santa Fe Institute Studies in The Sciences of Complexity, volume 15.
    [9] Inferring Algorithmic Patterns with Stack Augmented Recurrent Nets. Armand Joulin and Tomas Mikolov. arXiv Pre-Print, 2015.
    [10] Reinforcement Learning Turing Machine. Wojciech Zaremba and Ilya Sutskever. arXiv Pre-Print, 2015.
    [11] Generating sequences with recurrent neural networks. Alex Graves. arXiv preprint, 2013.
    [12] Long short-term memory. Sepp Hochreiter, Jürgen Schmidhuber. Neural computation, 9(8): 1735-1780, 1997.
    [13] Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Jürgen Schmidhuber. Neural Computation, 4(1):131-139, 1992.
    [14] A self-referential weight matrix. Jürgen Schmidhuber. In ICANN93, pp. 446-450. Springer, 1993.
    [15] Learning to combine foveal glimpses with a third-order boltzmann machine. Hugo Larochelle and Geoffrey E. Hinton. In NIPS, pp. 1243-1251, 2010.
    [16] Learning where to attend with deep architectures for image tracking. Denil et. al. Neural Computation, 2012.
    [17] Recurrent models of visual attention. V. Mnih, N. Hees, A. Graves and K. Kavukcuoglu. In NIPS, 2014.
    [18] A Neural Attention Model for Abstractive Sentence Summarization. A. M. Rush, S. Chopra and J. Weston. EMNLP 2015.


    QA and Dialog Benchmarks (from my group @ Facebook):
    Other Benchmarks (external to my group):


    Speaker Bio

    Jason Weston, Facebook AI Research (http://www.jaseweston.com/)
    Jason Weston is a research scientist at Facebook, NY, since Feb 2014. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2002, he was a researcher at Biowulf technologies, New York. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning and its application to text, audio and images. Jason has published over 100 papers, including best paper awards at ICML and ECML. He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. He was listed as one of the top 50 authors in Computer Science in Science.