Text Summarization Using “SPACY”
Text Summarization is a technique to convert a long piece of content into a shorter one without removing the actual context.
What is text summarization
Text summarization is the process of identifying the most important meaningful information in a document or set of related documents and compressing them into a shorter version preserving its overall meanings. The goal of automatic text summarization is presenting the source text into a shorter version with semantics
Importance of automatic text summarization
summaries reduce reading time when researching documents, summaries make the selection process easier and improves the effectiveness of indexing
Text summarization using SPACY is less biased than human summarizers
Personalized summaries are useful in question-answering systems as they provide personalized information
Using automatic or semi-automatic summarization systems enables commercial abstract services to — increase the number of text documents they are able to process.
Applications of Automatic text summarization
Meetings and video-conferencing, Help desk and customer support, Helping disabled people, Programming languages, Automated content creation, Email overload, E-learning and class assignments, Science and R&D, Patent research, Financial research, Legal contract analysis, Social media marketing, Question answering and bots, Video scripting, Medical cases, Books and literature, Media monitoring, Newsletters, Search marketing and SEO,
Internal document workflow are the major applications of text summarization.
Main approaches to text summarization
Extractive summarization: it works by selecting the most meaningful sentences in an article and
arranging them in a comprehensive manner. This means the summary sentences are extracted from
the article without any modifications.
Here in this article we are focusing on “Extractive text summarization”
Abstractive summarization: it works by paraphrasing its own version of the most important sentence in the article. Scales of document summarization.
Single-document summarization: the task of summarizing a standalone document. Note that a document could refer to different things depending on the use case like URL, internal PDF file, legal contract, financial report, email, etc.
Multi-document summarization: the task of assembling a collection of documents (usually through a query against a database or search engine) and generating a summary. That incorporates perspectives from across documents.
Common metrics any summarizer attempts to optimize
Topic coverage: does the summary incorporate the main topics from the document?
Readability: do the summary sentences flow in a logical way?
Important Steps in Text summarization
Text cleaning
Sentence tokenization
word tokenization
word-frequency table
Summarization
complete code snippet for extractive test summarization
“gensim” also a one of the extractive summarizer
from gensim.summarization import summarize
print("Text summarization using gensim")
summarize(text)
References
My github link for details:
My other blogs regarding text summarization using transfer learning(T5):
Thanks for the reading and Besides, leave a few claps if you found this text helpful!!!