如果没有用于NLP问题的定界符,该如何拆分句子?

时间:2019-04-21 20:51:31

标签: python nlp stanford-nlp tokenize

我想对没有定界符的句子进行情感分析。

输入文字如下:

"it 's   been   a   little   while   Kirk   tells   me it 's   actually   been   three   weeks   now   that I 've   been   using   this   device   right   here that   is   of   course   the   Galaxy   S   ten   I mean   I 've   just   been   living   with   this phone   this   has   been   my   phone   has   the   SIM card   in   it   I   took   photos I   lived   live   I   sent   tweets   whatsapp slack   email   whatever   other   app   this   was my   smart   phone   of   choice   for   the   last three   weeks   
I   have   some   feelings   about it   that   I   think   you   need   to   know   about there 's   some   things   I   like   there 's   some things   I   don 't   like   any   smartphone   out there   I   chose   to   use   the   standard   Galaxy S   10   not   the   S   10   plus   I   just   feel   like this   is   a   nice   form   factor   I   kind   of like   the   circular   cutout   as   opposed   to the   larger   one   I   mean   look   it 's   your choice   you   want   a   bigger   display   you   go for   the   plus   otherwise   they 're   basically the   same   first   things   first   what   are   you looking   at   what   greets   you   when   you unlock   this   phone   it 's   a   display   I   mean that 's   gonna   satisfy   anyone   in   a smartphone   universe   anyone   in   the segment   any   fan   that 's   out   there   you your   nephew   your   aunt   your   uncle   if   you want   maybe   the   best   display   in   the smartphone   game   then   you   go   with   this phone   
I   mean   that 's   pretty   standard stuff   you   already   knew   it   I   have   a   case on   this   phone   so   it   kind   of   diminishes the   edge   a   little   bit   after   all   samsung has   been   curving   these   edges   for   a   while now   some   people   love   it   some   people   less so   actually   I   really   like   this   case   I forget   the   name   of   it   got   enough   Amazon genuine   leather   yeehaw   ladies   and gentlemen   that 's   rawhide   will   he   do another   big   change   for   this   particular model   year   we   now   have   more   cameras   than ever   that 's   correct   that 's   three   lenses on   the   back   of   course   you 're   getting   a wider   angle   view   with   these   
I   used   it   I used   that   feature   I   love   that   feature   in fact   the   front-facing   camera   on   this device   is   wider   than   I   expected   as   well so   it 's   versatile   you   can   get   a   lot   of shots   of   course   the   camera   itself incredible   in   a   number   of   different circumstances   with   or   without   the   wide it 's   one   of   the   best   performers   out there   that   I 've   used   recently   I   want   to put   white   at   pixel   level   just the   software   the   the   isolation   the portrait   effect   and   so   on   not   that   I   use that   very   much   I   mean   for   me   this   camera it 's   an   easy   pick   kind   of   like   the display   again   not   much   of   a   surprise"

我想将文本分成很多句子,并分析每个句子的情感。我已经准备好预训练的模型,该模型可以分析“。”分隔的句子的情感。

有什么办法分割这些成串的句子吗?

1 个答案:

答案 0 :(得分:1)

预测文本(特别是语音转录)的标点是一个众所周知的问题。

您可以尝试将Punctuator2与提供的模型一起使用,也可以通过针对域中文本的新模型进行训练。在自述文件的底部查找一些相关项目的指针。

Grammarly开发了一种更简单的方法,仅在连续句子之间插入句点,如下所述:

https://www.grammarly.com/blog/nlp-run-on-sentences/

他们使用真实的训练数据和人工训练的数据进行了一些不错的实验,这很有用,因为可以很容易地从您知道在句子边界具有标点符号的文本(例如报纸文本)中生成训练数据。