DeepSegment

Follow @bedapudi6788 Star Fork

Designed with ASR outputs in mind, DeepSegment uses BiLSTM + CRF for automatic sentence boundary detection. It significantly outperforms the standard libraries (spacy, nltk, corenlp ..) on imperfect text and performs similarly for perfectly punctuated text.

For the completely unpunctuated test case, the absolute accuracy is 52.637 and the F1 score is 91.33 (precision: 93.242, recall: 89.506).

Installation

pip install deepsegment

Usage

from deepsegment import DeepSegment
# The default language is 'en'

# english (en), french (fr), italian (it) are supported as of now

segmenter = DeepSegment('en')

segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']

# Using with tfserving docker image
docker pull bedapudi6788/deepsegment_en:v2
docker run -d -p 8500:8500 bedapudi6788/deepsegment_en:v2

from deepsegment import DeepSegment

# The default language is 'en'

segmenter = DeepSegment('en', tf_serving=True)

segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']

Learn More