So in this article I discuss papers on multi-action dialog policies in task-oriented dialog systems. Task-oriented dialog systems are conversational systems that help you achieve some purpose — like booking flight tickets for instance. Some examples are Alexa, Siri etc.
These systems have four components, of which the Dialog Policy component is used to determine system action based on the current user dialog and the dialog state.
Usually, each user dialog is answered with a single system dialog. But often, multiple responses can make sense: for instance, “find me a restaurant” can be responded to with “Sure! What cuisine are you looking for?” or “Any particular areas you would prefer?”. New research now suggests that a one-to-many mapping of state and actions should be used to get more diverse and plausible responses. This is because human conversation is diverse, and there need not be only one correct way towards task completion .
What are some popular methods or strong baselines commonly used?
Task-oriented dialog systems with multiple plausible responses are mostly neglected in the research community. Most of the works on this topic focus on general dialog systems; only four papers were found that were specific to task-oriented dialog. A description of these four works is given below.
- This paper by Zhang et al. proposes two components: a framework for data augmentation, and a multi-decoder network. The framework helps learn a dialog policy that can generate diverse responses, and adds these multiple actions to the dataset through oversampling. The decoder model is an end-to-end dialog system that consists of three decoders: one for belief span, one for system action and one for system response. The system action decoder utilizes the aforementioned framework to generate multiple actions, which are then passed to the system response decoder.
- The authors of this work by Shu et al. propose a gated cell in an encoder-decoder model which outputs dialog acts in the (continue, act, slot) format. Continue indicates whether more actions should be generated, act is the act type (e.g. request), and slot is for the slots corresponding to the act type. At each turn, the input to the encoder is the dialog state and a knowledge base vector, which is then passed to a three-component decoder that sequentially decodes the continue, act and slot parts of the tuple.
- This paper by Rajendran et al. presents a mask end-to-end memory network for multiple responses. This retrieval-based model is trained in two phases: a supervised learning phase and a reinforcement learning phase. After training for 150 epochs, the best supervised model is chosen as input to the RL part, which then gives the final output.
- Jhunjhunwala et al. present a dialog policy framework with interactive human teaching. In the first step, the model is trained on human-human dialog. In the second, for each user dialog 5 system actions are submitted to human trainers to pick the best one and improve the model. Then finally, the best action is sent to the NLG component. While the dialog policy is multi-action, only one final response is given as system output, so this is not exactly a multi-response system. It underscores the point that multi-action and multi-response systems are separate, and one-to-one mapping between action and response is not always to be expected.
Building multi-action, multi-response dialog systems is a challenging problem, often compounded by the lack of suitable annotated data and evaluation metrics. But this has strong potential in the dialog community, and should be pursued further in research.