Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
dialogue-progression/readme.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
214 lines (177 sloc)
12.1 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Towards a Progression-Aware Autonomous Dialogue Agent | |
Official implementation for our paper: [Towards a Progression-Aware Autonomous Dialogue Agent](https://arxiv.org/abs/2205.03692). | |
To run the chat interface and interact with pre-trained models, see [Run Chat Interface](#run-chat-interface). | |
To reproduce results from the paper, see [Reproduce Paper Results](#reproduce-paper-results). | |
## Overview | |
We propose a framework in which dialogue agents can evaluate the progression of a conversation toward or away from desired outcomes, and use this signal to inform planning for subsequent responses. Our framework is composed of three key elements: (1) the notion of a "global" dialogue state (GDS) space, (2) a task-specific progression function (PF) computed in terms of a conversation's trajectory through this space, and (3) a planning mechanism based on dialogue rollouts (Lewis et al., 2017) by which an agent may use progression signals to select its next response. See [our paper](https://arxiv.org/abs/2205.03692) for more details. | |
![Architecture](images/architecture.png) | |
This repository contains everything needed to reproduce the results on the donation solicitation task of [Persuasion For Good](https://gitlab.com/ucdavisnlp/persuasionforgood) (Wang et al., 2019) as reported in our paper. Additionally, scripts are provided | |
to train progression models on new datasets. | |
## Installation | |
### Dependencies | |
Python 3.8 or greater is required. We pinned all dependency versions to be at (or near) what we used in the paper to aid | |
reproducibility. Thus, we recommend installing into a new environment. If you wish to upgrade dependencies, note that some library versions are pinned to work with our existing trained progression models (see comments in [requirements.txt](requirements.txt)). | |
This is because serialization is used as part of the model saving process and version compatibility must be maintained during deserialization on loading. If you wish to upgrade these libraries anyway, the models can be easily re-trained on the new versions using [the provided scripts](#train-and-evaluate-progression-models). | |
If PyTorch is not already installed in your environment, please install the appropriate configuration of | |
PyTorch for you environment (OS, CUDA version) before proceeding - | |
see [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/). | |
To install dependencies, run: | |
```bash | |
pip install -r requirements.txt | |
``` | |
## Interactive Demos | |
### Run Chat Interface | |
We provide an interactive streamlit chat interface that can be used to converse with our fine-tuned DialoGPT model | |
on the Persuasion For Good solicitation task. Rollouts can be dynamically throttled using provided controls, and GDS / PF | |
plots are given in real-time during the chat. | |
To launch the streamlit chat interface, run: | |
```bash | |
streamlit run dialogue-progression/demo_streamlit.py | |
``` | |
![Streamlit Chat Interface](images/chat.png) | |
By default, the [LACAI/DialoGPT-large-PFG](https://huggingface.co/LACAI/DialoGPT-large-PFG) checkpoint is used for the response | |
generator along with the trained `supervised_large_adapted` progression model used for self-play evaluations in the paper. | |
It is possible to use other response generator checkpoints: for example, we provide a smaller DialoGPT checkpoint [LACAI/DialoGPT-small-PFG](https://huggingface.co/LACAI/DialoGPT-small-PFG). It is also possible to use other trained progression models | |
and to fix a generation seed. | |
To launch the streamlit chat interface using alternate response and progression models: | |
```bash | |
streamlit run dialogue-progression/demo_streamlit.py -- \ | |
--agent-modelpath=LACAI/DialoGPT-small-PFG \ | |
--progression-modelpath=models/progression/persuasion_for_good/unsupervised \ | |
--random-state=42 | |
``` | |
We have released the following response generator models which can be used with `--agent-modelpath`: | |
| HuggingFace link | | |
|--------------------------| | |
| [LACAI/DialoGPT-large-PFG](https://huggingface.co/LACAI/DialoGPT-large-PFG) | | |
| [LACAI/DialoGPT-small-PFG](https://huggingface.co/LACAI/DialoGPT-small-PFG) | | |
We have released the following progression models which can be used with `--progression-modelpath`: * | |
| Root model link (use this) | HuggingFace link | | |
|----------------------------|-----------------------------------------| | |
| [unsupervised](models/progression/persuasion_for_good/unsupervised) | N/A | | |
| [supervised_base](models/progression/persuasion_for_good/supervised_base) | [LACAI/roberta-base-PFG-progression](https://huggingface.co/LACAI/roberta-base-PFG-progression) | | |
| [supervised_large](models/progression/persuasion_for_good/supervised_large) | [LACAI/roberta-large-PFG-progression](https://huggingface.co/LACAI/roberta-large-PFG-progression) | | |
| [supervised_large_adapted](models/progression/persuasion_for_good/supervised_large_adapted) | [LACAI/roberta-large-adapted-PFG-progression](https://huggingface.co/LACAI/roberta-large-adapted-PFG-progression) | | |
\* The models above all originate from initialization seed 2594 in our multi-seed experiments. The `supervised_large_adapted` model of this seed is | |
the one selected for use in the rollout experiments (see Section 4.3.1 in the paper). We released the other three models from the same seed for consistency. | |
### Run Rollout Comparer | |
We provide a comparer tool that can be used to inspect rollouts side-by-side during a conversation. It is possible to use this interface to chat in a similar manner supported by the streamlit app, however the comparer tool supports a mutable dialogue history. | |
This means that you can make small modifications to any point in the conversation and observe how it impacts the resulting progression computations and rollout results. | |
To launch the rollout comparer tool, run: | |
```bash | |
python dialogue-progression/demo_server.py | |
``` | |
Then, navigate to [http://localhost:8080/demo_ui](http://localhost:8080/demo_ui) | |
![Rollout Comparer Tool](images/rollout_compare.png) | |
The rollout comparer tool uses the same model defaults as the streamlit chat interface, and can be run with alternate arguments in a similar way: | |
```bash | |
python dialogue-progression/demo_server.py \ | |
--agent-modelpath=LACAI/DialoGPT-small-PFG \ | |
--progression-modelpath=models/progression/persuasion_for_good/unsupervised \ | |
--random-state=42 | |
``` | |
## Reproduce Paper Results | |
### Dataset | |
To reproduce the results from our paper, first download the contents of the [data folder from Persuasion For Good](https://gitlab.com/ucdavisnlp/persuasionforgood/-/tree/master/data) into [data/persuasion_for_good](data/persuasion_for_good) in this repository. Do not replace the CSV and XLSX files already provided | |
in [data/persuasion_for_good/AnnotatedData](data/persuasion_for_good/AnnotatedData) - these are our manual progression annotations. | |
### Train and Evaluate Progression Models: | |
Run the following command to train and evaluate the `unsupervised`, `roberta-base`, `roberta-large`, and `roberta-large-adapted` | |
progression models for all of the 33 initialization seeds used in the paper: | |
```bash | |
./run_train_progression_multiseed.sh | |
``` | |
To train and evaluate an individual **unsupervised** progression model using a single seed (e.g., 42)*, run: | |
```bash | |
python dialogue-progression/train_progression.py \ | |
--run-name=unsupervised_42 \ | |
--recency-weight=0.3 \ | |
--kmeans-n-clusters=21 \ | |
--normalize-embeddings \ | |
--kmeans-progression-inv-dist \ | |
--transformers-modelpath=sentence-transformers/all-mpnet-base-v2 \ | |
--model-random-state=42 | |
python dialogue-progression/progression_model_eval_manual.py \ | |
--progression-modelpath=models/progression/persuasion_for_good/unsupervised | |
``` | |
\* Use seed 2594 to replicate the released unsupervised model. | |
To train and evaluate an individual **supervised** progression model (e.g., base on `roberta-base`) using a single seed (e.g., 42)*, run: | |
```bash | |
python dialogue-progression/train_progression.py \ | |
--run-name=supervised_base_42 \ | |
--embedding-level=dialog \ | |
--normalize-embeddings \ | |
--batch-size=16 \ | |
--transformers-modelpath=roberta-base \ | |
--model-random-state=42 | |
python dialogue-progression/progression_model_eval_manual.py \ | |
--progression-modelpath=models/progression/persuasion_for_good/supervised_base | |
``` | |
\* Use seed 2594 to replicate the released supervised models. | |
When training supervised progression models, the following naming convention is typically used for runs: | |
| run-name | transformers-modelpath | | |
|---------------------------------|-----------------------------------------| | |
| unsupervised_{seed} | sentence-transformers/all-mpnet-base-v2 | | |
| supervised_base_{seed} | roberta-base | | |
| supervised_large_{seed} | roberta-large | | |
| supervised_large_adapted_{seed} | LACAI/roberta-large-dialog-narrative | | |
Each run outputs two CSVs containing the auto and manual eval results for each model and seed. These files are named `{run_name}_progression_df.csv` (auto eval) and `manual_eval_results_df` (manual eval) where `run_name` is the value provided in the `--run-name` argument. If using `run_train_progression_multiseed.sh` then only two CSVs will be output, each compiling the results | |
for all models and seeds. | |
**Reference Results** | |
These CSV outputs can be compared against our provided [auto-eval reference results](results/progression/auto_eval_results_33_seeds.xlsx) and [manual-eval reference results](results/progression/manual_eval_results_33_seeds.xlsx) for each model and seed. The aggregations done in these reference results sheets are the source of Tables 1 & 2 in our paper. | |
### Rollout Self-Play Experiments: | |
Self-play experiments can be run using the trained `roberta-large-adapted` progression model used in the paper, provided in | |
[models/progression/persuasion_for_good/supervised_large_adapted](models/progression/persuasion_for_good/supervised_large_adapted). | |
When loading this model, RoBERTa weights are automatically loaded from our [LACAI/roberta-large-adapted-PFG-progression](https://huggingface.co/LACAI/roberta-large-adapted-PFG-progression) checkpoint on the HuggingFace model hub. | |
Run the following commands to run the 2x2x3 and 3x3x5 self-play experiments for each of the 5 generation seeds used in the paper: | |
```bash | |
./run_rollouts_eval_2_2_3_multiseed.sh | |
./run_rollouts_eval_3_3_5_multiseed.sh | |
./run_rollouts_eval_stats.sh | |
``` | |
Running for all 5 generation seeds can take several days. If desired, the first two commands can be run in different terminals using separate GPUs: | |
Terminal 1: | |
```bash | |
export CUDA_VISIBLE_DEVICES=0 | |
./run_rollouts_eval_2_2_3_multiseed.sh | |
``` | |
Terminal 2: | |
```bash | |
export CUDA_VISIBLE_DEVICES=1 | |
./run_rollouts_eval_3_3_5_multiseed.sh | |
``` | |
Then when both complete: | |
```bash | |
./run_rollouts_eval_stats.sh | |
``` | |
To run self-play experiments using a specified configuration (e.g., 2x2x3) and a single seed (e.g., 42), run: | |
```bash | |
python dialogue-progression/rollouts_eval.py \ | |
--n-candidates=2 \ | |
--n-rollouts-per-candidate=2 \ | |
--n-utterances-per-rollout=3 \ | |
--outputpath=rollouts_self_play/2_2_3/42 \ | |
--agent-modelpath=LACAI/DialoGPT-large-PFG \ | |
--progression-modelpath=models/progression/persuasion_for_good/supervised_large_adapted \ | |
--model-random-state=42 | |
./run_rollouts_eval_stats.sh | |
``` | |
The command above should complete in just a few hours on a single GPU. | |
**Reference Results** | |
We have provided reference results which are the [complete outputs](results/rollouts_self_play) of the multi-seed self-play experiments in the paper, including all dialogue text. To compute the aggregation of these results as done in the experiment | |
scripts above, run: | |
```bash | |
./run_rollouts_eval_stats.sh results/rollouts_self_play | |
``` | |
These aggregations are the source of Table 3 in our paper. | |
## Train Progression Models on New Datasets | |
Coming soon! | |
## Reference | |
If you use our code or models in your work, please cite: | |
``` | |
@article{sanders2022towards, | |
title={Towards a Progression-Aware Autonomous Dialogue Agent}, | |
author={Sanders, Abraham and Strzalkowski, Tomek and Si, Mei and Chang, Albert and Dey, Deepanshu and Braasch, Jonas and Wang, Dakuo}, | |
journal={arXiv preprint arXiv:2205.03692}, | |
year={2022} | |
} | |
``` |