Towards a Progression-Aware Autonomous Dialogue Agent
Official implementation for our paper: Towards a Progression-Aware Autonomous Dialogue Agent.
To run the chat interface and interact with pre-trained models, see Run Chat Interface. To reproduce results from the paper, see Reproduce Paper Results.
Overview
We propose a framework in which dialogue agents can evaluate the progression of a conversation toward or away from desired outcomes, and use this signal to inform planning for subsequent responses. Our framework is composed of three key elements: (1) the notion of a "global" dialogue state (GDS) space, (2) a task-specific progression function (PF) computed in terms of a conversation's trajectory through this space, and (3) a planning mechanism based on dialogue rollouts (Lewis et al., 2017) by which an agent may use progression signals to select its next response. See our paper for more details.
This repository contains everything needed to reproduce the results on the donation solicitation task of Persuasion For Good (Wang et al., 2019) as reported in our paper. Additionally, scripts are provided to train progression models on new datasets.
Installation
Dependencies
Python 3.8 or greater is required. We pinned all dependency versions to be at (or near) what we used in the paper to aid reproducibility. Thus, we recommend installing into a new environment. If you wish to upgrade dependencies, note that some library versions are pinned to work with our existing trained progression models (see comments in requirements.txt). This is because serialization is used as part of the model saving process and version compatibility must be maintained during deserialization on loading. If you wish to upgrade these libraries anyway, the models can be easily re-trained on the new versions using the provided scripts.
If PyTorch is not already installed in your environment, please install the appropriate configuration of PyTorch for you environment (OS, CUDA version) before proceeding - see https://pytorch.org/get-started/locally/.
To install dependencies, run:
pip install -r requirements.txt
Interactive Demos
Run Chat Interface
We provide an interactive streamlit chat interface that can be used to converse with our fine-tuned DialoGPT model on the Persuasion For Good solicitation task. Rollouts can be dynamically throttled using provided controls, and GDS / PF plots are given in real-time during the chat.
To launch the streamlit chat interface, run:
streamlit run dialogue-progression/demo_streamlit.py
By default, the LACAI/DialoGPT-large-PFG checkpoint is used for the response
generator along with the trained supervised_large_adapted
progression model used for self-play evaluations in the paper.
It is possible to use other response generator checkpoints: for example, we provide a smaller DialoGPT checkpoint LACAI/DialoGPT-small-PFG. It is also possible to use other trained progression models
and to fix a generation seed.
To launch the streamlit chat interface using alternate response and progression models:
streamlit run dialogue-progression/demo_streamlit.py -- \
--agent-modelpath=LACAI/DialoGPT-small-PFG \
--progression-modelpath=models/progression/persuasion_for_good/unsupervised \
--random-state=42
We have released the following response generator models which can be used with --agent-modelpath
:
HuggingFace link |
---|
LACAI/DialoGPT-large-PFG |
LACAI/DialoGPT-small-PFG |
We have released the following progression models which can be used with --progression-modelpath
: *
Root model link (use this) | HuggingFace link |
---|---|
unsupervised | N/A |
supervised_base | LACAI/roberta-base-PFG-progression |
supervised_large | LACAI/roberta-large-PFG-progression |
supervised_large_adapted | LACAI/roberta-large-adapted-PFG-progression |
* The models above all originate from initialization seed 2594 in our multi-seed experiments. The supervised_large_adapted
model of this seed is
the one selected for use in the rollout experiments (see Section 4.3.1 in the paper). We released the other three models from the same seed for consistency.
Run Rollout Comparer
We provide a comparer tool that can be used to inspect rollouts side-by-side during a conversation. It is possible to use this interface to chat in a similar manner supported by the streamlit app, however the comparer tool supports a mutable dialogue history. This means that you can make small modifications to any point in the conversation and observe how it impacts the resulting progression computations and rollout results.
To launch the rollout comparer tool, run:
python dialogue-progression/demo_server.py
Then, navigate to http://localhost:8080/demo_ui
The rollout comparer tool uses the same model defaults as the streamlit chat interface, and can be run with alternate arguments in a similar way:
python dialogue-progression/demo_server.py \
--agent-modelpath=LACAI/DialoGPT-small-PFG \
--progression-modelpath=models/progression/persuasion_for_good/unsupervised \
--random-state=42
Reproduce Paper Results
Dataset
To reproduce the results from our paper, first download the contents of the data folder from Persuasion For Good into data/persuasion_for_good in this repository. Do not replace the CSV and XLSX files already provided in data/persuasion_for_good/AnnotatedData - these are our manual progression annotations.
Train and Evaluate Progression Models:
Run the following command to train and evaluate the unsupervised
, roberta-base
, roberta-large
, and roberta-large-adapted
progression models for all of the 33 initialization seeds used in the paper:
./run_train_progression_multiseed.sh
To train and evaluate an individual unsupervised progression model using a single seed (e.g., 42)*, run:
python dialogue-progression/train_progression.py \
--run-name=unsupervised_42 \
--recency-weight=0.3 \
--kmeans-n-clusters=21 \
--normalize-embeddings \
--kmeans-progression-inv-dist \
--transformers-modelpath=sentence-transformers/all-mpnet-base-v2 \
--model-random-state=42
python dialogue-progression/progression_model_eval_manual.py \
--progression-modelpath=models/progression/persuasion_for_good/unsupervised
* Use seed 2594 to replicate the released unsupervised model.
To train and evaluate an individual supervised progression model (e.g., base on roberta-base
) using a single seed (e.g., 42)*, run:
python dialogue-progression/train_progression.py \
--run-name=supervised_base_42 \
--embedding-level=dialog \
--normalize-embeddings \
--batch-size=16 \
--transformers-modelpath=roberta-base \
--model-random-state=42
python dialogue-progression/progression_model_eval_manual.py \
--progression-modelpath=models/progression/persuasion_for_good/supervised_base
* Use seed 2594 to replicate the released supervised models.
When training supervised progression models, the following naming convention is typically used for runs:
run-name | transformers-modelpath |
---|---|
unsupervised_{seed} | sentence-transformers/all-mpnet-base-v2 |
supervised_base_{seed} | roberta-base |
supervised_large_{seed} | roberta-large |
supervised_large_adapted_{seed} | LACAI/roberta-large-dialog-narrative |
Each run outputs two CSVs containing the auto and manual eval results for each model and seed. These files are named {run_name}_progression_df.csv
(auto eval) and manual_eval_results_df
(manual eval) where run_name
is the value provided in the --run-name
argument. If using run_train_progression_multiseed.sh
then only two CSVs will be output, each compiling the results
for all models and seeds.
Reference Results
These CSV outputs can be compared against our provided auto-eval reference results and manual-eval reference results for each model and seed. The aggregations done in these reference results sheets are the source of Tables 1 & 2 in our paper.
Rollout Self-Play Experiments:
Self-play experiments can be run using the trained roberta-large-adapted
progression model used in the paper, provided in
models/progression/persuasion_for_good/supervised_large_adapted.
When loading this model, RoBERTa weights are automatically loaded from our LACAI/roberta-large-adapted-PFG-progression checkpoint on the HuggingFace model hub.
Run the following commands to run the 2x2x3 and 3x3x5 self-play experiments for each of the 5 generation seeds used in the paper:
./run_rollouts_eval_2_2_3_multiseed.sh
./run_rollouts_eval_3_3_5_multiseed.sh
./run_rollouts_eval_stats.sh
Running for all 5 generation seeds can take several days. If desired, the first two commands can be run in different terminals using separate GPUs: Terminal 1:
export CUDA_VISIBLE_DEVICES=0
./run_rollouts_eval_2_2_3_multiseed.sh
Terminal 2:
export CUDA_VISIBLE_DEVICES=1
./run_rollouts_eval_3_3_5_multiseed.sh
Then when both complete:
./run_rollouts_eval_stats.sh
To run self-play experiments using a specified configuration (e.g., 2x2x3) and a single seed (e.g., 42), run:
python dialogue-progression/rollouts_eval.py \
--n-candidates=2 \
--n-rollouts-per-candidate=2 \
--n-utterances-per-rollout=3 \
--outputpath=rollouts_self_play/2_2_3/42 \
--agent-modelpath=LACAI/DialoGPT-large-PFG \
--progression-modelpath=models/progression/persuasion_for_good/supervised_large_adapted \
--model-random-state=42
./run_rollouts_eval_stats.sh
The command above should complete in just a few hours on a single GPU.
Reference Results
We have provided reference results which are the complete outputs of the multi-seed self-play experiments in the paper, including all dialogue text. To compute the aggregation of these results as done in the experiment scripts above, run:
./run_rollouts_eval_stats.sh results/rollouts_self_play
These aggregations are the source of Table 3 in our paper.
Train Progression Models on New Datasets
Coming soon!
Reference
If you use our code or models in your work, please cite:
@article{sanders2022towards,
title={Towards a Progression-Aware Autonomous Dialogue Agent},
author={Sanders, Abraham and Strzalkowski, Tomek and Si, Mei and Chang, Albert and Dey, Deepanshu and Braasch, Jonas and Wang, Dakuo},
journal={arXiv preprint arXiv:2205.03692},
year={2022}
}