Improving Conversation Modelling using Attention Based Variational Hierarchical RNN
Keywords:chatbot, conversation agent, response generation, variational hierarchical RNN, deep learning, natural language processing, attention mechanism
Conversation modeling is one of most important applications of natural language processing. Building response generation model for open domain conversation in a Chatbot is one of the hardest challenges in this area. The deep neural network architectures such as sequence to sequence models and its hierarchical variants provide a significant improvement in the field of conversation modeling. Although these models require large size corpus, they may cause huge data loss in training phase. Also, these models are unable to concentrate on important data in given context. It affects on generation of responses. To tackle these issues, this research work proposes a Variational Hierarchical Conversation RNN with Attention mechanism (VHCRA) model for response generation. The VHCRA uses the concept of latent variable representation to avoid data degeneracy and the attention mechanism to identify important data within context. The model is trained on large size benchmark dataset, i.e., Cornell Movie Dialog corpus which contains conversations from different movies. The model is evaluated using automatic evaluation metrics such as Negative Log-likelihood and Embedding-Based Metrics. The experimental result shows that the proposed model gains significant improvement in comparison with recently proposed approaches and generate meaningful responses according to the context.
I. Sutskever, “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., pp. 3104–3112, 2014.
I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, and J. Pineau, “A hierarchical latent variable encoder-decoder model for generating dialogues,” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI'17), February 2017, pp. 3295–3301.
I. V Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2014, pp. 3776–3783.
A. Sordoni, M. Galley, M. Auli, and C. Brockett, “A neural network approach to context-sensitive generation of conversational responses,” Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 196–205. https://doi.org/10.3115/v1/N15-1020.
Y. Park, J. Cho, G. Kim, “A hierarchical latent structure for variational conversation modeling,” arXiv:1804.03424v2, 2018. https://doi.org/10.18653/v1/N18-1162.
D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling, “Semi-supervised learning with deep generative models,” arXiv:1406.5298v2, pp. 1–9, 2009.
H. Bahuleyan, L. Mou, O. Vechtomova, and P. Poupart, “Variational attention for sequence-to-sequence models,”arXiv preprint arXiv:1712.08207, 2017.
M. Zhou, C. Xing, Y. Wu, W. Wu, Y. Huang, “Hierarchical recurrent attention network for response generation,” Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI'18), 2018, pp. 5610–5617.
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, pp. 1–15, 2015.
M. Luong and C. D. Manning, “Effective approaches to attention-based neural machine translation,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421. https://doi.org/10.18653/v1/D15-1166.
J. Weizenbaum, “ELIZA – A computer program for the study of natural language communication between man and machine,” Commun. ACM, vol. 9, no. 1, pp. 36–45, 1966. https://doi.org/10.1145/365153.365168.
B. Abushawar and E. Atwell, “ALICE chatbot : Trials and outputs,” Computación y Sistemas, vol. 19, no. 4, pp. 625–632, 2015. https://doi.org/10.13053/cys-19-4-2326.
H. Chen, X. Liu, D. Yin, and J. Tang, “A survey on dialogue systems: Recent advances and new frontiers,” ACM SIGKDD Explor. Newsl., vol. 19, no. 2, pp. 25–35, 2017. https://doi.org/10.1145/3166054.3166058.
X. Shen and H. Su, “A conditional variational framework for dialog generation,” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2016, pp. 504–509. https://doi.org/10.18653/v1/P17-2080.
C. Xing, W. Wu, Y. Wu, and J. Liu, “Topic aware neural response generation,” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2016, pp. 3351–3357.
H. Zheng, W. E. I. Wang, W. Chen, and A. K. Sangaiah, “Automatic generation of news comments based on gated attention neural networks,” IEEE Access, vol. 6, pp. 702–710, 2018. https://doi.org/10.1109/ACCESS.2017.2774839.
L. Shang, Z. Lu, and H. Li, “Neural responding machine for short-text conversation,” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015, pp. 1577–1586. https://doi.org/10.3115/v1/P15-1152.
J.He, B.Wang, M. Fu, “Hierarchical attention and knowledge matching networks with information enhancement for end-to-end task-oriented dialog systems,” IEEE Access, vol. 7, pp. 18871–18883, 2019. ttps://doi.org/10.1109/ACCESS.2019.2892730.
C. Mellon and U. C. Berkeley, “Tutorial on variational autoencoders,” arXiv Prepr. arXiv1606.05908, pp. 1–23, 2016.
T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Comput. Intell. Mag., vol. 13, no. 3, pp. 55–75, 2018. https://doi.org/10.1109/MCI.2018.2840738.
Y. Deng, Y. Kim, and A. M. Rush, “Latent alignment and variational attention,” Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 2018, pp. 9735–9747.
X. Shen, H. Su, S. Niu, and V. Demberg, “Improving variational encoder-decoders in dialogue generation,” Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018, pp. 5456–5463.
X. Shen and H. Su, “Towards Better Variational Encoder-Decoders in Seq2Seq Tasks,” Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Towards, 2018, no. 1, pp. 8155–8156.
C. Danescu-Niculescu-Mizil and L. Lee, “Chameleons in imagined conversations : A new approach to understanding coordination of linguistic style in dialogs,” Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, 2011, pp. 76-87.
C. Liu, R. Lowe, I. V Serban, M. Noseworthy, L. Charlin, and J. Pineau, “How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2006, pp. 2122–2132. https://doi.org/10.18653/v1/D16-1230.
Z. Xie, “Neural text generation: A practical guide,” in arXiv preprint arXiv:1711.09534, 2018, pp. 1–21.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.