Dialogue Summarization: A Deep Learning Approach

Aseem Srivastava 18 Feb, 2021 • 5 min read

This article was published as a part of the Data Science Blogathon.

dialogue summarization

Dialogue Summarization: Its types and methodology

  Image cc: Aseem Srivastava

Summarizing long pieces of text is a challenging problem. Summarization is done primarily in two ways: extractive approach and abstractive approach. In this work, we break down the problem of meeting summarization into extractive and abstractive components which further collectively generate a summary of the conversation.

 

What is Dialogue Summarization?

Humans are social animals, we exchange ideas, share information, and make plans with each other. Text and Speech are the two common conversation mediums, but mostly it’s speech. With the abundance of digital conversation happening over online messaging, IRC, meeting platforms, and the ubiquity of automatic speech recognition systems come vast amounts of meeting transcripts.
Therefore, the need to succinctly summarize the content of the conversation naturally arises. Several methods of generating summaries have been proposed. A very standard and crucial application of conversational dialogue summarization is in meeting summary generation.
Traditional methods used different extractive summarization approaches which were good to extract only important phrases within the document. Some new techniques came up with transformer-based models that were capable of generating coherent and subjective summaries.
Seeing this availability of research and lack of capabilities of the current model, we present a novel hierarchical model, an extractive to abstractive summarization approach that does extractive then abstractive summarization and gave the highest ROUGE scores.
Our best performing extractor is a two-step hierarchical model that encodes each sentence with a pre-trained BERT model and then applies a bidirectional LSTM to create each utterance’s sentence embedding and later passed to an unsupervised clustering mechanism to detect key sentences. For the abstractive module, our best-performing model is a fine-tuned PEGASUS Model to generate an abstractive summary.
In summary, our contributions are as follows:
  1. We present a new novel approach to work on long summaries
  2. We beat the state-of-the-art (SOTA) results on the AMI meeting dataset.

 

Let’s Dive into the Methodology

The methodology used is a two-step hierarchical extractive to abstractive summary generation methodology in which transformer-based architecture, PEGASUS as in cite{zhang2020pegasus} is used where the base architecture of the model is a standard transformer encoder-decoder with a novel pre-training technique where whole sentences are masked with [MASK] tokens along with few other tokens randomly.

The input to the encoder-decoder model is given in 512 length embedding which is generated after processing the AMI input from a BERT-based extractive summarizer where we apply k-means on a BERT sentence-level embedding. We use two variants of this approach, with and without fine-tuning.

Our baseline model is also a transformer model where pre-trained encoder BERT is used to produce extractive summaries and further these extractive summaries are passed with a GPT2 decoder model to construct abstractive and meaningful summaries. (see figure below).
Dialogue summarization architecture

Model Architecture

Extractive Summarization Approach

Extractive summary generation is itself performed in two steps where input text is converted into the BERT sentence embedding and is further passed through an unsupervised algorithm cite{Jin2010} to cluster the most important sentences. These clusters are basically a collection of sentences from the input text and represent the most relevant sentences from the input.
The unsupervised extractive summary generation technique has been attempted previously and it was shown how clustering techniques can help to select key components of the text. Since this extractive technique is followed in an unsupervised fashion, it becomes worth mentioning that the need for parallel annotated data suddenly gets eliminated and could be trained on large corpora. As shown in the figure. 1, the left section shows the extractive summary generator and which is passed further to the abstractive summary generator on the right.

Abstractive Summarization Approach

The abstractive summarizer is an encoder-decoder-based language model, PEGASUS to generate a semantically good summary. PEGASUS is a pre-training technique introducing gap sentences masking and summary generation.
Typically the architecture of the PEGASUS model contains 15 layers of encoder and 15 layers of a decoder which collectively consider text documents as input after masking. Their hypothesis states that the closer the pre-training self-supervised objective is to the final down-stream task, the better the fine-tuning performance. In the discussed approach, pre-training, several sentences are removed from documents and the model is tasked with recovering them. Example input for pre-training is a document with missing sentences, while the output consists of the missing sentences concatenated together.

This is an incredibly difficult task that may seem impossible, even for people, and we don’t expect the model to solve it perfectly. However, It is a problematic task that encourages the model to get to know about language as well as facts about the world, as well as how to filter out information considered throughout the document to produce output that marginally relates to the fine-tuning our task. The pros of this self-supervision are that we could generate as many similar examples as there are documents, without any annotation, which is often the narrow section in supervised systems.

 

Results

About the Metric: ROUGE 1 and ROUGE 2

These metrics are respectively based on unigram, bigram, and unigram overlap with a maximum skip distance of 4, and be highly correlated with human evaluations. ROUGE-2 scores can be seen as a measure of summary readability. Also, there is one opposite way to evaluate the performance of the approach and that is the human evaluation metric.

But it is slow and very expensive. Although, empirical studies show that the model’s performance judgment can be accurately done using ROGUE metric approaches in both automatic metrics and human evaluation. Table 1 gives a summary of all the experiments performed on the summarization task on the AMI dataset. The results clearly show that the results were improved by a considerable margin.

Results dialogue summarization

Results(table)

So in the end, let’s see if this work was new or something found on the first to last page of Google search!

So far the work on meeting summarization is not done on a large scale and there is a lot of scope for improvement in this research area. Since most of the summarization tasks are performed over document or new article data format. To make good use of the previous state-of-the-art models, we converted the conversation from dialogue format to article format to generate summaries of conversation in the news article summary generation methodology.
Our key approach extractive to abstractive was fundamentally the novel idea of combining two different steps to generate the novel idea. Since it is hard to train transformers on longer sentences. This two-step summarization technique involves the state of the art implementation of the PEGASUS model which is finetuned on our data.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Aseem Srivastava 18 Feb 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Sanjeev
Sanjeev 04 May, 2021

Hi, Nice article. Will it be possible to share some code base related to the methodology described above? Thanks, Sanjeev

Related Courses

Natural Language Processing
Become a full stack data scientist