Welcome to text-summarization’s documentation!¶
Contents:
Introduction¶
This project is an auto text summarization build in python with Spacy and Universal Sentence Encoder on Flask framework.
Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. -Wikipedia
This project was inspired by work of Praveen Dubey in Understand Text Summarization and create your own summarizer in python.
The summarization uses the Universal Sentence Encoder[1] with the deep learning framework of TensorFlow, SpaCy for linguistic and NetworkX for ranking. The project itself is a RestAPI script build with Flask framework.
The algorithm steps are as follows:
Read the given text and splitting it into sentences using SpaCy.
Generate sentences pairs cosine similarity score based on their dimensional vectors with TensorFlow graph and the Universal Sentence Encoder as an embedding layer (see network architecture picture below).
Generate similarity matrix across the sentences and rank them using NetworkX implementation of PageRank.
Sort the rank sentences and pick the top ones according the user’s input.
TensorFlow graph with the Universal Sentence Encoder¶

References¶
[1] Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil. Universal Sentence Encoder. arXiv:1803.11175, 2018.