Question

我正在做一本书的情感分析项目。我正在使用nltk.vader.sentimentintensityanalyzer记录“哈利波特”系列中各段落的情感极性。

要创建段落并删除我做的换行符：

text_file = open('HP1 Sorcerer of Stone.txt', 'r')
text = str(text_file.readlines())   
text.replace('\\n"', "").replace("\'", "").replace(" , ","")

这会将书分成几段。对话时会出现问题。

对话框的每个字符的单词之间都有相同的段落

' "So?" snapped Mrs. Dursley. ',
' "Well, I just thought... maybe... it was something to do with... you 
know... her crowd." ',
' Mrs. Dursley sipped her tea through pursed lips. Mr. Dursley wondered 
whether he dared tell her he\\d heard the name "Potter." He decided he 
didn\\t dare. Instead he said, as casually as he could, "Their son -- 
he\\d be about Dudley\\s age now, wouldn\\t he?" ',
' "I suppose so," said Mrs. Dursley stiffly. ',
' "What\\s his name again? Howard, isn\\t it?" ',
' "Harry. Nasty, common name, if you ask me." ',

我该如何编辑我的细分方法，以便将对话作为一个要素保持在一起？整个对话框将用作强度分析器的单个输入。

在一本书上进行自然语言处理时，如何连接对话线

0 个答案: