Publications

For a more up to date list of my publications, please see my Google Scholar. For a list of my patents, please see here.

When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories
#22
Annual Meeting of the Association for Computational Linguistics, 2023
Alex Mallen, Akari Asai, Victor Zhong, Dajarshi Das, Hannaneh Hajishirzi, and Daniel Khashabi
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.
@inproceedings{ mallen2023when,
  title={ When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories },
  author={ Alex Mallen and Akari Asai and Victor Zhong and Dajarshi Das and Hannaneh Hajishirzi and Daniel Khashabi },
  booktitle={ ACL },
  year={ 2023 }
}
          

RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering
#21
Preprint, 2022
Victor Zhong, Weijia Shi, Wen-tau Yih, and Luke Zettlemoyer
We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA). RoMQA contains clusters of questions that are derived from related constraints mined from the Wikidata knowledge graph. RoMQA evaluates robustness of QA models to varying constraints by measuring worst-case performance within each question cluster. Compared to prior QA datasets, RoMQA has more human-written questions that require reasoning over more evidence text and have, on average, many more correct answers. In addition, human annotators rate RoMQA questions as more natural or likely to be asked by people. We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, and find that RoMQA is challenging: zero-shot and few-shot models perform similarly to naive baselines, while supervised retrieval methods perform well below gold evidence upper bounds. Moreover, existing models are not robust to variations in question constraints, but can be made more robust by tuning on clusters of related questions. Our results show that RoMQA is a challenging benchmark for large language models, and provides a quantifiable test to build more robust QA methods.
@inproceedings{ zhong2022romqa,
  title={ Ro{MQA}: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering },
  author={ Victor Zhong and Weijia Shi and Wen-tau Yih and Luke Zettlemoyer },
  booktitle={ CoRR abs/2210.14353 },
  year={ 2022 }
}
          

M2D2: A Massively Multi-Domain Language Modeling Dataset
#20
Conference on Empirical Methods in Natural Language Processing, 2022
Machel Reid, Victor Zhong, Suchin Gururangan, and Luke Zettlemoyer
We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation in language models (LMs). M2D2 consists of 8.5B tokens and spans 145 domains extracted from Wikipedia and Semantic Scholar. Using ontologies derived from Wikipedia and ArXiv categories, we organize the domains in each data source into 22 groups. This two-level hierarchy enables the study of relationships between domains and their effects on in and out-of-domain performance after adaptation. We also present a number of insights into the nature of effective domain adaptation in LMs, as examples of the new types of studies M2D2 enables. To improve in-domain performance, we show the benefits of adapting the LM along a domain hierarchy; adapting to smaller amounts of fine-grained domain specific data can lead to larger in-domain performance gains than larger amounts of weakly relevant data. We further demonstrate a tradeoff between in-domain specialization and out-of-domain generalization within and across ontologies, as well as a strong correlation between out-of-domain performance and lexical overlap between domains.
@inproceedings{ reid2022m,
  title={ {M2D2}: A Massively Multi-Domain Language Modeling Dataset },
  author={ Machel Reid and Victor Zhong and Suchin Gururangan and Luke Zettlemoyer },
  booktitle={ EMNLP },
  year={ 2022 }
}
          

Improving Policy Learning via Language Dynamics Distillation
#19
Neural Information Processing Systems, 2022
Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, and Tim Rocktäschel
Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, as well as methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modeling with expert demonstrations is more effective than with non-experts.
@inproceedings{ zhong2022improving,
  title={ Improving Policy Learning via Language Dynamics Distillation },
  author={ Victor Zhong and Jesse Mu and Luke Zettlemoyer and Edward Grefenstette and Tim Rocktäschel },
  booktitle={ NeurIPS },
  year={ 2022 }
}
          

Improving Intrinsic Exploration with Language Abstractions
#18
Neural Information Processing Systems, 2022
Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, and Edward Grefenstette
Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.
@inproceedings{ mu2022improving,
  title={ Improving Intrinsic Exploration with Language Abstractions },
  author={ Jesse Mu and Victor Zhong and Roberta Raileanu and Minqi Jiang and Noah Goodman and Tim Rocktäschel and Edward Grefenstette },
  booktitle={ NeurIPS },
  year={ 2022 }
}
          

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
#17
Conference on Empirical Methods in Natural Language Processing, 2022
Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A Smith, Luke Zettlemoyer, and Tao Yu
Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases. Since the inputs and outputs of SKG tasks are heterogeneous, they have been studied separately by different communities, which limits systematic and compatible research on SKG. In this paper, we overcome this limitation by proposing the SKG framework, which unifies 21 SKG tasks into a text-to-text format, aiming to promote systematic SKG research, instead of being exclusive to a single task, domain, or dataset. We use UnifiedSKG to benchmark T5 with different sizes and show that T5, with simple modifications when necessary, achieves state-of-the-art performance on almost all of the 21 tasks. We further demonstrate that multi-task prefix-tuning improves the performance on most tasks, largely improving the overall performance. UnifiedSKG also facilitates the investigation of zero-shot and few-shot learning, and we show that T0, GPT-3, and Codex struggle in zero-shot and few-shot learning for SKG. We also use UnifiedSKG to conduct a series of controlled experiments on structured knowledge encoding variants across SKG tasks. UnifiedSKG is easily extensible to more tasks, and it is open-sourced at https://github.com/hkunlp/unifiedskg; latest collections at https://unifiedskg.com.
@inproceedings{ xie2022unifiedskg,
  title={ Unified{SKG}: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models },
  author={ Tianbao Xie and Chen Henry Wu and Peng Shi and Ruiqi Zhong and Torsten Scholak and Michihiro Yasunaga and Chien-Sheng Wu and Ming Zhong and Pengcheng Yin and Sida I Wang and Victor Zhong and Bailin Wang and Chengzu Li and Connor Boyle and Ansong Ni and Ziyu Yao and Dragomir Radev and Caiming Xiong and Lingpeng Kong and Rui Zhang and Noah A Smith and Luke Zettlemoyer and Tao Yu },
  booktitle={ EMNLP },
  year={ 2022 }
}
          

SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark
#16
Neural Information Processing Systems, 2021
Victor Zhong, Austin W. Hanjie, Karthik Narasimhan, and Luke Zettlemoyer
Existing works in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. The best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.
@inproceedings{ zhong2021silg,
  title={ {SILG}: The Multi-environment Symbolic Interactive Language Grounding Benchmark },
  author={ Victor Zhong and Austin W. Hanjie and Karthik Narasimhan and Luke Zettlemoyer },
  booktitle={ NeurIPS },
  year={ 2021 }
}
          

Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning
#15
International Conference on Machine Learning, 2021
Austin W. Hanjie, Victor Zhong, and Karthik Narasimhan
We consider the problem of leveraging textual descriptions to improve generalization of control policies. We introduce a new multi-task environment MESSENGER with free-form natural language manuals describing the environment dynamics. In contrast to previous work, MESSENGER does not assume prior knowledge connecting text and state observations - the control policy must simultaneously learn to ground a natural language manual to entity symbols and dynamics in the environment. In order to learn this challenging grounding, we develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses a multi-modal entity-conditioned attention module that allows for selective focus over relevant sentences in the manual for each entity in the environment. EMMA is end-to-end differentiable and can learn a latent grounding of entities and dynamics from text to observations using environment rewards as the only source of supervision. We demonstrate that EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining significantly higher rewards compared to multiple baselines. However, performance on the hardest stage of MESSENGER remains low, demonstrating the significant challenge in accurately grounding dynamics and the need for additional work in this direction.
@inproceedings{ hanjie2021grounding,
  title={ Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning },
  author={ Austin W. Hanjie and Victor Zhong and Karthik Narasimhan },
  booktitle={ ICML },
  year={ 2021 }
}
          

LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer
#14
Findings of the Annual Meeting of the Association for Computational Linguistics, 2021
Machel Reid, and Victor Zhong
Many types of text style transfer can be achieved with only small, precise edits (e.g. sentiment transfer from I had a terrible time... to I had a great time...). We propose a coarse-to-fine editor for style transfer that transforms text using Levenshtein edit operations (e.g. insert, replace, delete). Unlike prior single-span edit methods, our method concurrently edits multiple spans in the source text. To train without parallel style text pairs (e.g. pairs of +/- sentiment statements), we propose an unsupervised data synthesis procedure. We first convert text to style-agnostic templates using style classifier attention (e.g. I had a SLOT time...), then fill in slots in these templates using fine-tuned pretrained language models. Our method outperforms existing generation and editing style transfer methods on sentiment (Yelp, Amazon) and politeness (Polite) transfer. In particular, multi-span editing achieves higher performance and more diverse output than single-span editing. Moreover, compared to previous methods on unsupervised data synthesis, our method results in higher quality parallel style pairs and improves model performance.
@inproceedings{ reid2021lewis,
  title={ {LEWIS}: Levenshtein Editing for Unsupervised Text Style Transfer },
  author={ Machel Reid and Victor Zhong },
  booktitle={ ACL Findings },
  year={ 2021 }
}
          

Grounded Adaptation for Zero-shot Executable Semantic Parsing
#13
Conference on Empirical Methods in Natural Language Processing, 2020
Victor Zhong, Mike Lewis, Sida I. Wang, and Luke Zettlemoyer
We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e.g. new database schemas). GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycle-consistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose input-output consistency are verified. On the Spider, Sparc, and CoSQL zero-shot semantic parsing tasks, GAZP improves logical form and execution accuracy of the baseline parser. Our analyses show that GAZP outperforms data-augmentation in the training environment, performance increases with the amount of GAZP-synthesized data, and cycle-consistency is central to successful adaptation.
@inproceedings{ zhong2020grounded,
  title={ Grounded Adaptation for Zero-shot Executable Semantic Parsing },
  author={ Victor Zhong and Mike Lewis and Sida I. Wang and Luke Zettlemoyer },
  booktitle={ EMNLP },
  year={ 2020 }
}
          

RTFM: Generalising to Novel Environment Dynamics via Reading
#12
International Conference on Learning Representations, 2020
Victor Zhong, Tim Rocktäschel, and Edward Grefenstette
Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2pi, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2pi generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2pi produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.
@inproceedings{ zhong2020rtfm,
  title={ {RTFM}: Generalising to Novel Environment Dynamics via Reading },
  author={ Victor Zhong and Tim Rocktäschel and Edward Grefenstette },
  booktitle={ ICLR },
  year={ 2020 }
}
          

E3: Entailment-driven Extracting and Editing for Conversational Machine Reading
#11
Annual Meeting of the Association for Computational Linguistics, 2019
Victor Zhong, and Luke Zettlemoyer
Conversational machine reading systems help users answer high-level questions (e.g. determine if they qualify for particular government benefits) when they do not know the exact rules by which the determination is made(e.g. whether they need certain income levels or veteran status). The key challenge is that these rules are only provided in the form of a procedural text (e.g. guidelines from government website) which the system must read to figure out what to ask the user. We present a new conversational machine reading model that jointly extracts a set of decision rules from the procedural text while reasoning about which are entailed by the conversational history and which still need to be edited to create questions for the user. On the recently introduced ShARC conversational machine reading dataset, our Entailment-driven Extract and Edit network (E3) achieves a new state-of-the-art, outperforming existing systems as well as a new BERT-based baseline. In addition, by explicitly highlighting which information still needs to be gathered, E3 provides a more explainable alternative to prior work. We release source code for our models and experiments at https://github.com/vzhong/e3.
@inproceedings{ zhong2019e,
  title={ {E3}: Entailment-driven Extracting and Editing for Conversational Machine Reading },
  author={ Victor Zhong and Luke Zettlemoyer },
  booktitle={ ACL },
  year={ 2019 }
}
          

Multi-hop Reading Comprehension through Question Decomposition and Rescoring
#10
Annual Meeting of the Association for Computational Linguistics, 2019
Sewon Min, Victor Zhong, Luke Zettlemoyer, and Hannaneh Hajishirzi
Multi-hop Reading Comprehension (RC) requires reasoning and aggregation across several paragraphs. We propose a system for multi-hop RC that decomposes a compositional question into simpler sub-questions that can be answered by off-the-shelf single-hop RC models. Since annotations for such decomposition are expensive, we recast sub-question generation as a span prediction problem and show that our method, trained using only 400 labeled examples, generates sub-questions that are as effective as human-authored sub-questions. We also introduce a new global rescoring approach that considers each decomposition (i.e. the sub-questions and their answers) to select the best final answer, greatly improving overall performance. Our experiments on HotpotQA show that this approach achieves the state-of-the-art results, while providing explainable evidence for its decision making in the form of sub-questions.
@inproceedings{ min2019multi,
  title={ Multi-hop Reading Comprehension through Question Decomposition and Rescoring },
  author={ Sewon Min and Victor Zhong and Luke Zettlemoyer and Hannaneh Hajishirzi },
  booktitle={ ACL },
  year={ 2019 }
}
          

Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering
#9
International Conference on Learning Representations, 2019
Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, and Richard Socher
End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer and evidence appear close together in a single document. In this work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new question answering model that combines information from evidence across multiple documents. The CFC consists of a coarse-grain module that interprets documents with respect to the query then finds a relevant answer, and a fine-grain module which scores each candidate answer by comparing its occurrences across all of the documents with the query. We design these modules using hierarchies of coattention and self-attention, which learn to emphasize different parts of the input. On the Qangaroo WikiHop multi-evidence question answering task, the CFC obtains a new state-of-the-art result of 70.6% on the blind test set, outperforming the previous best by 3% accuracy despite not using pretrained contextual encoders.
@inproceedings{ zhong2019coarse,
  title={ Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering },
  author={ Victor Zhong and Caiming Xiong and Nitish Shirish Keskar and Richard Socher },
  booktitle={ ICLR },
  year={ 2019 }
}
          

Global-Locally Self-Attentive Dialogue State Tracker
#8
Annual Meeting of the Association for Computational Linguistics, 2018
Victor Zhong, Caiming Xiong, and Richard Socher
Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters between estimators for different types (called slots) of dialogue states, and uses local modules to learn slot-specific features. We show that this significantly improves tracking of rare states and achieves state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% joint goal accuracy and 97.5% request accuracy, outperforming prior work by 1.1% and 1.0%.
@inproceedings{ zhong2018global,
  title={ Global-Locally Self-Attentive Dialogue State Tracker },
  author={ Victor Zhong and Caiming Xiong and Richard Socher },
  booktitle={ ACL },
  year={ 2018 }
}
          

Efficient and Robust Question Answering from Minimal Context over Documents
#7
Annual Meeting of the Association for Computational Linguistics, 2018
Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong
Neural models for question answering (QA) over documents have achieved significant performance improvements. Although effective, these models do not scale to large corpora due to their complex modeling of interactions between the document and the question. Moreover, recent work has shown that such models are sensitive to adversarial inputs. In this paper, we study the minimal context required to answer the question, and find that most questions in existing datasets can be answered with a small set of sentences. Inspired by this observation, we propose a simple sentence selector to select the minimal set of sentences to feed into the QA model. Our overall system achieves significant reductions in training (up to 15 times) and inference times (up to 13 times), with accuracy comparable to or better than the state-of-the-art on SQuAD, NewsQA, TriviaQA and SQuAD-Open. Furthermore, our experimental results and analyses show that our approach is more robust to adversarial inputs.
@inproceedings{ min2018efficient,
  title={ Efficient and Robust Question Answering from Minimal Context over Documents },
  author={ Sewon Min and Victor Zhong and Richard Socher and Caiming Xiong },
  booktitle={ ACL },
  year={ 2018 }
}
          

DCN+: Mixed Objective and Deep Residual Coattention for Question Answering
#6
International Conference on Learning Representations, 2018
Caiming Xiong, Victor Zhong, and Richard Socher
Traditional models for question answering optimize using cross entropy loss, which encourages exact answers at the cost of penalizing nearby or overlapping answers that are sometimes equally accurate. We propose a mixed objective that combines cross entropy loss with self-critical policy learning. The objective uses rewards derived from word overlap to solve the misalignment between evaluation metric and optimization objective. In addition to the mixed objective, we improve dynamic coattention networks (DCN) with a deep residual coattention encoder that is inspired by recent work in deep self-attention and residual networks. Our proposals improve model performance across question types and input lengths, especially for long questions that requires the ability to capture long-term dependencies. On the Stanford Question Answering Dataset, our model achieves state-of-the-art results with 75.1% exact match accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy and 86.0% F1.
@inproceedings{ xiong2018dcn,
  title={ {DCN}+: Mixed Objective and Deep Residual Coattention for Question Answering },
  author={ Caiming Xiong and Victor Zhong and Richard Socher },
  booktitle={ ICLR },
  year={ 2018 }
}
          

Position-aware Attention and Supervised Data Improve Slot Filling
#5
Conference on Empirical Methods in Natural Language Processing, 2017
Outstanding paper
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Mannning
Organized relational knowledge in the form of knowledge graphs is important for many applications. However, the ability to populate knowledge bases with facts automatically extracted from documents has improved frustratingly slowly. This paper simultaneously addresses two issues that have held back prior work. We first propose an effective new model, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction. Then we build TACRED, a large (106,264 examples) supervised relation extraction dataset, obtained via crowdsourcing and targeted towards TAC KBP relations. The combination of better supervised data and a more appropriate high-capacity model enables much better relation extraction performance. When the model trained on this new dataset replaces the previous relation extraction component of the best TAC KBP 2015 slot filling system, its F1 score increases markedly from 22.2% to 26.7%.
@inproceedings{ zhang2017position,
  title={ Position-aware Attention and Supervised Data Improve Slot Filling },
  author={ Yuhao Zhang and Victor Zhong and Danqi Chen and Gabor Angeli and Christopher D. Mannning },
  booktitle={ EMNLP },
  year={ 2017 }
}
          

Dynamic Coattention Networks for Question Answering
#4
International Conference on Learning Representations, 2017
Caiming Xiong, Victor Zhong, and Richard Socher
Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.
@inproceedings{ xiong2017dynamic,
  title={ Dynamic Coattention Networks for Question Answering },
  author={ Caiming Xiong and Victor Zhong and Richard Socher },
  booktitle={ ICLR },
  year={ 2017 }
}
          

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
#3
Preprint, 2017
Victor Zhong, Caiming Xiong, and Richard Socher
A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets. By applying policy-based reinforcement learning with a query execution environment to WikiSQL, our model Seq2SQL outperforms attentional sequence to sequence models, improving execution accuracy from 35.9% to 59.4% and logical form accuracy from 23.4% to 48.3%.
@inproceedings{ zhong2017seq,
  title={ Seq{2SQL}: Generating Structured Queries from Natural Language using Reinforcement Learning },
  author={ Victor Zhong and Caiming Xiong and Richard Socher },
  booktitle={ CoRR abs/1709.00103 },
  year={ 2017 }
}
          

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
#2
International Conference on Machine Learning, 2016
Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher
Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.
@inproceedings{ kumar2016ask,
  title={ Ask Me Anything: Dynamic Memory Networks for Natural Language Processing },
  author={ Ankit Kumar and Ozan Irsoy and Jonathan Su and James Bradbury and Robert English and Brian Pierce and Peter Ondruska and Ishaan Gulrajani and Victor Zhong and Romain Paulus and Richard Socher },
  booktitle={ ICML },
  year={ 2016 }
}
          

Bootstrapped Self Training for Knowledge Base Population
#1
Text Analysis Conference, 2015
Gabor Angeli, Victor Zhong, Danqi Chen, Arun Chaganty, Jason Bolton, Melvin Johnson Premkumar, Pasupat Panupong, Sonal Gupta, and Christopher D Manning
A central challenge in relation extraction is the lack of supervised training data. Pattern-based relation extractors suffer from low recall, whereas distant supervision yields noisy data which hurts precision. We propose bootstrapped selftraining to capture the benefits of both systems: the precision of patterns and the generalizability of trained models. We show that training on the output of patterns drastically improves performance over the patterns. We propose self-training for further improvement: recall can be improved by incorporating the predictions from previous iterations; precision by filtering the assumed negatives based previous predictions. We show that even our patternbased model achieves good performance on the task, and the self-trained models rank among the top systems.
@inproceedings{ angeli2015bootstrapped,
  title={ Bootstrapped Self Training for Knowledge Base Population },
  author={ Gabor Angeli and Victor Zhong and Danqi Chen and Arun Chaganty and Jason Bolton and Melvin Johnson Premkumar and Pasupat Panupong and Sonal Gupta and Christopher D Manning },
  booktitle={ TAC },
  year={ 2015 }
}