Publications

For a more up to date list of my publications, please see my Google Scholar.

Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning
#15
International Conference on Machine Learning, 2021
H. J. Austin Wang, Victor Zhong, and Karthik Narasimhan
We consider the problem of leveraging textual descriptions to improve generalization of control policies. We introduce a new multi-task environment MESSENGER with free-form natural language manuals describing the environment dynamics. In contrast to previous work, MESSENGER does not assume prior knowledge connecting text and state observations - the control policy must simultaneously learn to ground a natural language manual to entity symbols and dynamics in the environment. In order to learn this challenging grounding, we develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses a multi-modal entity-conditioned attention module that allows for selective focus over relevant sentences in the manual for each entity in the environment. EMMA is end-to-end differentiable and can learn a latent grounding of entities and dynamics from text to observations using environment rewards as the only source of supervision. We demonstrate that EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining significantly higher rewards compared to multiple baselines. However, performance on the hardest stage of MESSENGER remains low, demonstrating the significant challenge in accurately grounding dynamics and the need for additional work in this direction.
@inproceedings{ wang2021grounding,
  title={ Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning },
  author={ H. J. Austin Wang and Victor Zhong and Karthik Narasimhan },
  booktitle={ ICML },
  year={ 2021 }
}
          

LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer
#14
Findings of the Annual Meeting of the Association for Computational Linguistics, 2021
Machel Reid, and Victor Zhong
Many types of text style transfer can be achieved with only small, precise edits (e.g. sentiment transfer from I had a terrible time... to I had a great time...). We propose a coarse-to-fine editor for style transfer that transforms text using Levenshtein edit operations (e.g. insert, replace, delete). Unlike prior single-span edit methods, our method concurrently edits multiple spans in the source text. To train without parallel style text pairs (e.g. pairs of +/- sentiment statements), we propose an unsupervised data synthesis procedure. We first convert text to style-agnostic templates using style classifier attention (e.g. I had a SLOT time...), then fill in slots in these templates using fine-tuned pretrained language models. Our method outperforms existing generation and editing style transfer methods on sentiment (Yelp, Amazon) and politeness (Polite) transfer. In particular, multi-span editing achieves higher performance and more diverse output than single-span editing. Moreover, compared to previous methods on unsupervised data synthesis, our method results in higher quality parallel style pairs and improves model performance.
@inproceedings{ reid2021lewis,
  title={ {LEWIS}: Levenshtein Editing for Unsupervised Text Style Transfer },
  author={ Machel Reid and Victor Zhong },
  booktitle={ ACL Findings },
  year={ 2021 }
}
          

Grounded Adaptation for Zero-shot Executable Semantic Parsing
#13
Conference on Empirical Methods in Natural Language Processing, 2020
Victor Zhong, Mike Lewis, Sida I. Wang, and Luke Zettlemoyer
We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e.g. new database schemas). GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycle-consistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose input-output consistency are verified. On the Spider, Sparc, and CoSQL zero-shot semantic parsing tasks, GAZP improves logical form and execution accuracy of the baseline parser. Our analyses show that GAZP outperforms data-augmentation in the training environment, performance increases with the amount of GAZP-synthesized data, and cycle-consistency is central to successful adaptation.
@inproceedings{ zhong2020grounded,
  title={ Grounded Adaptation for Zero-shot Executable Semantic Parsing },
  author={ Victor Zhong and Mike Lewis and Sida I. Wang and Luke Zettlemoyer },
  booktitle={ EMNLP },
  year={ 2020 }
}
          

RTFM: Generalising to Novel Environment Dynamics via Reading
#12
International Conference on Learning Representations, 2020
Victor Zhong, Tim Rocktäschel, and Edward Grefenstette
Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2pi, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2pi generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2pi produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.
@inproceedings{ zhong2020rtfm,
  title={ {RTFM}: Generalising to Novel Environment Dynamics via Reading },
  author={ Victor Zhong and Tim Rocktäschel and Edward Grefenstette },
  booktitle={ ICLR },
  year={ 2020 }
}
          

E3: Entailment-driven Extracting and Editing for Conversational Machine Reading
#11
Annual Meeting of the Association for Computational Linguistics, 2019
Victor Zhong, and Luke Zettlemoyer
Conversational machine reading systems help users answer high-level questions (e.g. determine if they qualify for particular government benefits) when they do not know the exact rules by which the determination is made(e.g. whether they need certain income levels or veteran status). The key challenge is that these rules are only provided in the form of a procedural text (e.g. guidelines from government website) which the system must read to figure out what to ask the user. We present a new conversational machine reading model that jointly extracts a set of decision rules from the procedural text while reasoning about which are entailed by the conversational history and which still need to be edited to create questions for the user. On the recently introduced ShARC conversational machine reading dataset, our Entailment-driven Extract and Edit network (E3) achieves a new state-of-the-art, outperforming existing systems as well as a new BERT-based baseline. In addition, by explicitly highlighting which information still needs to be gathered, E3 provides a more explainable alternative to prior work. We release source code for our models and experiments at https://github.com/vzhong/e3.
@inproceedings{ zhong2019e,
  title={ {E3}: Entailment-driven Extracting and Editing for Conversational Machine Reading },
  author={ Victor Zhong and Luke Zettlemoyer },
  booktitle={ ACL },
  year={ 2019 }
}
          

Multi-hop Reading Comprehension through Question Decomposition and Rescoring
#10
Annual Meeting of the Association for Computational Linguistics, 2019
Sewon Min, Victor Zhong, Luke Zettlemoyer, and Hannaneh Hajishirzi
Multi-hop Reading Comprehension (RC) requires reasoning and aggregation across several paragraphs. We propose a system for multi-hop RC that decomposes a compositional question into simpler sub-questions that can be answered by off-the-shelf single-hop RC models. Since annotations for such decomposition are expensive, we recast sub-question generation as a span prediction problem and show that our method, trained using only 400 labeled examples, generates sub-questions that are as effective as human-authored sub-questions. We also introduce a new global rescoring approach that considers each decomposition (i.e. the sub-questions and their answers) to select the best final answer, greatly improving overall performance. Our experiments on HotpotQA show that this approach achieves the state-of-the-art results, while providing explainable evidence for its decision making in the form of sub-questions.
@inproceedings{ min2019multi,
  title={ Multi-hop Reading Comprehension through Question Decomposition and Rescoring },
  author={ Sewon Min and Victor Zhong and Luke Zettlemoyer and Hannaneh Hajishirzi },
  booktitle={ ACL },
  year={ 2019 }
}
          

Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering
#9
International Conference on Learning Representations, 2019
Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, and Richard Socher
End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer and evidence appear close together in a single document. In this work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new question answering model that combines information from evidence across multiple documents. The CFC consists of a coarse-grain module that interprets documents with respect to the query then finds a relevant answer, and a fine-grain module which scores each candidate answer by comparing its occurrences across all of the documents with the query. We design these modules using hierarchies of coattention and self-attention, which learn to emphasize different parts of the input. On the Qangaroo WikiHop multi-evidence question answering task, the CFC obtains a new state-of-the-art result of 70.6% on the blind test set, outperforming the previous best by 3% accuracy despite not using pretrained contextual encoders.
@inproceedings{ zhong2019coarse,
  title={ Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering },
  author={ Victor Zhong and Caiming Xiong and Nitish Shirish Keskar and Richard Socher },
  booktitle={ ICLR },
  year={ 2019 }
}
          

Global-Locally Self-Attentive Dialogue State Tracker
#8
Annual Meeting of the Association for Computational Linguistics, 2018
Victor Zhong, Caiming Xiong, and Richard Socher
Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters between estimators for different types (called slots) of dialogue states, and uses local modules to learn slot-specific features. We show that this significantly improves tracking of rare states and achieves state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% joint goal accuracy and 97.5% request accuracy, outperforming prior work by 1.1% and 1.0%.
@inproceedings{ zhong2018global,
  title={ Global-Locally Self-Attentive Dialogue State Tracker },
  author={ Victor Zhong and Caiming Xiong and Richard Socher },
  booktitle={ ACL },
  year={ 2018 }
}
          

Efficient and Robust Question Answering from Minimal Context over Documents
#7
Annual Meeting of the Association for Computational Linguistics, 2018
Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong
Neural models for question answering (QA) over documents have achieved significant performance improvements. Although effective, these models do not scale to large corpora due to their complex modeling of interactions between the document and the question. Moreover, recent work has shown that such models are sensitive to adversarial inputs. In this paper, we study the minimal context required to answer the question, and find that most questions in existing datasets can be answered with a small set of sentences. Inspired by this observation, we propose a simple sentence selector to select the minimal set of sentences to feed into the QA model. Our overall system achieves significant reductions in training (up to 15 times) and inference times (up to 13 times), with accuracy comparable to or better than the state-of-the-art on SQuAD, NewsQA, TriviaQA and SQuAD-Open. Furthermore, our experimental results and analyses show that our approach is more robust to adversarial inputs.
@inproceedings{ min2018efficient,
  title={ Efficient and Robust Question Answering from Minimal Context over Documents },
  author={ Sewon Min and Victor Zhong and Richard Socher and Caiming Xiong },
  booktitle={ ACL },
  year={ 2018 }
}
          

DCN+: Mixed Objective and Deep Residual Coattention for Question Answering
#6
International Conference on Learning Representations, 2018
Caiming Xiong, Victor Zhong, and Richard Socher
Traditional models for question answering optimize using cross entropy loss, which encourages exact answers at the cost of penalizing nearby or overlapping answers that are sometimes equally accurate. We propose a mixed objective that combines cross entropy loss with self-critical policy learning. The objective uses rewards derived from word overlap to solve the misalignment between evaluation metric and optimization objective. In addition to the mixed objective, we improve dynamic coattention networks (DCN) with a deep residual coattention encoder that is inspired by recent work in deep self-attention and residual networks. Our proposals improve model performance across question types and input lengths, especially for long questions that requires the ability to capture long-term dependencies. On the Stanford Question Answering Dataset, our model achieves state-of-the-art results with 75.1% exact match accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy and 86.0% F1.
@inproceedings{ xiong2018dcn,
  title={ {DCN}+: Mixed Objective and Deep Residual Coattention for Question Answering },
  author={ Caiming Xiong and Victor Zhong and Richard Socher },
  booktitle={ ICLR },
  year={ 2018 }
}
          

Position-aware Attention and Supervised Data Improve Slot Filling
#5
Conference on Empirical Methods in Natural Language Processing, 2017
Outstanding paper
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Mannning
Organized relational knowledge in the form of knowledge graphs is important for many applications. However, the ability to populate knowledge bases with facts automatically extracted from documents has improved frustratingly slowly. This paper simultaneously addresses two issues that have held back prior work. We first propose an effective new model, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction. Then we build TACRED, a large (106,264 examples) supervised relation extraction dataset, obtained via crowdsourcing and targeted towards TAC KBP relations. The combination of better supervised data and a more appropriate high-capacity model enables much better relation extraction performance. When the model trained on this new dataset replaces the previous relation extraction component of the best TAC KBP 2015 slot filling system, its F1 score increases markedly from 22.2% to 26.7%.
@inproceedings{ zhang2017position,
  title={ Position-aware Attention and Supervised Data Improve Slot Filling },
  author={ Yuhao Zhang and Victor Zhong and Danqi Chen and Gabor Angeli and Christopher D. Mannning },
  booktitle={ EMNLP },
  year={ 2017 }
}
          

Dynamic Coattention Networks for Question Answering
#4
International Conference on Learning Representations, 2017
Caiming Xiong, Victor Zhong, and Richard Socher
Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.
@inproceedings{ xiong2017dynamic,
  title={ Dynamic Coattention Networks for Question Answering },
  author={ Caiming Xiong and Victor Zhong and Richard Socher },
  booktitle={ ICLR },
  year={ 2017 }
}
          

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
#3
Preprint, 2017
Victor Zhong, Caiming Xiong, and Richard Socher
A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets. By applying policy-based reinforcement learning with a query execution environment to WikiSQL, our model Seq2SQL outperforms attentional sequence to sequence models, improving execution accuracy from 35.9% to 59.4% and logical form accuracy from 23.4% to 48.3%.
@inproceedings{ zhong2017seq,
  title={ Seq{2SQL}: Generating Structured Queries from Natural Language using Reinforcement Learning },
  author={ Victor Zhong and Caiming Xiong and Richard Socher },
  booktitle={ CoRR abs/1709.00103 },
  year={ 2017 }
}
          

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
#2
International Conference on Machine Learning, 2016
Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher
Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.
@inproceedings{ kumar2016ask,
  title={ Ask Me Anything: Dynamic Memory Networks for Natural Language Processing },
  author={ Ankit Kumar and Ozan Irsoy and Jonathan Su and James Bradbury and Robert English and Brian Pierce and Peter Ondruska and Ishaan Gulrajani and Victor Zhong and Romain Paulus and Richard Socher },
  booktitle={ ICML },
  year={ 2016 }
}
          

Bootstrapped Self Training for Knowledge Base Population
#1
Text Analysis Conference, 2015
Gabor Angeli, Victor Zhong, Danqi Chen, Arun Chaganty, Jason Bolton, Melvin Johnson Premkumar, Pasupat Panupong, Sonal Gupta, and Christopher D Manning
A central challenge in relation extraction is the lack of supervised training data. Pattern-based relation extractors suffer from low recall, whereas distant supervision yields noisy data which hurts precision. We propose bootstrapped selftraining to capture the benefits of both systems: the precision of patterns and the generalizability of trained models. We show that training on the output of patterns drastically improves performance over the patterns. We propose self-training for further improvement: recall can be improved by incorporating the predictions from previous iterations; precision by filtering the assumed negatives based previous predictions. We show that even our patternbased model achieves good performance on the task, and the self-trained models rank among the top systems.
@inproceedings{ angeli2015bootstrapped,
  title={ Bootstrapped Self Training for Knowledge Base Population },
  author={ Gabor Angeli and Victor Zhong and Danqi Chen and Arun Chaganty and Jason Bolton and Melvin Johnson Premkumar and Pasupat Panupong and Sonal Gupta and Christopher D Manning },
  booktitle={ TAC },
  year={ 2015 }
}