我使用的是NLTK软件包,它有一个函数可以告诉我某个句子是正面,负面还是中立:
from nltk.sentiment.util import demo_liu_hu_lexicon
demo_liu_hu_lexicon('Today is a an awesome, happy day')
>>> Positive
问题是,该功能没有返回声明 - 它只是打印"肯定","否定"或"中立"到stdout。所有它返回 - 隐含地 - 是一个NoneType
对象。 (Advanced Custom Fields' s函数的源代码。)
有什么方法可以捕获这个输出(除了在我的机器上弄乱NLTK源代码)?
答案 0 :(得分:3)
import sys
from io import StringIO
class capt_stdout:
def __init__(self):
self._stdout = None
self._string_io = None
def __enter__(self):
self._stdout = sys.stdout
sys.stdout = self._string_io = StringIO()
return self
def __exit__(self, type, value, traceback):
sys.stdout = self._stdout
@property
def string(self):
return self._string_io.getvalue()
像这样使用:
with capt_stdout() as out:
demo_liu_hu_lexicon('Today is a an awesome, happy day')
demo_liu_hu_lexicon_output = out.string
答案 1 :(得分:1)
demo_liu_hu_lexicon
函数是如何使用opinion_lexicon
的演示函数。它用于测试,不应直接使用。
让我们看一下这个函数,看看我们如何重新创建一个类似的函数https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616
def demo_liu_hu_lexicon(sentence, plot=False):
"""
Basic example of sentiment classification using Liu and Hu opinion lexicon.
This function simply counts the number of positive, negative and neutral words
in the sentence and classifies it depending on which polarity is more represented.
Words that do not appear in the lexicon are considered as neutral.
:param sentence: a sentence whose polarity has to be classified.
:param plot: if True, plot a visual representation of the sentence polarity.
"""
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
tokenizer = treebank.TreebankWordTokenizer()
好的,这对于函数内部存在的导入来说是一种奇怪的用途,但这是因为它是用于简单测试或文档的演示函数。
此外,treebank.TreebankWordTokenizer()
的使用相当奇怪,我们可以简单地使用nltk.word_tokenize
。
让我们移出导入并将demo_liu_hu_lexicon
重写为simple_sentiment
函数。
from nltk.corpus import opinion_lexicon
from nltk import word_tokenize
def simple_sentiment(text):
pass
接下来,我们看到
def demo_liu_hu_lexicon(sentence, plot=False):
"""
Basic example of sentiment classification using Liu and Hu opinion lexicon.
This function simply counts the number of positive, negative and neutral words
in the sentence and classifies it depending on which polarity is more represented.
Words that do not appear in the lexicon are considered as neutral.
:param sentence: a sentence whose polarity has to be classified.
:param plot: if True, plot a visual representation of the sentence polarity.
"""
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
tokenizer = treebank.TreebankWordTokenizer()
pos_words = 0
neg_words = 0
tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]
x = list(range(len(tokenized_sent))) # x axis for the plot
y = []
功能
x
和y
稍后会进行初始化,所以让我们忽略它。 如果我们进一步了解这个功能:
def demo_liu_hu_lexicon(sentence, plot=False):
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
tokenizer = treebank.TreebankWordTokenizer()
pos_words = 0
neg_words = 0
tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]
x = list(range(len(tokenized_sent))) # x axis for the plot
y = []
for word in tokenized_sent:
if word in opinion_lexicon.positive():
pos_words += 1
y.append(1) # positive
elif word in opinion_lexicon.negative():
neg_words += 1
y.append(-1) # negative
else:
y.append(0) # neutral
if pos_words > neg_words:
print('Positive')
elif pos_words < neg_words:
print('Negative')
elif pos_words == neg_words:
print('Neutral')
循环只是遍历每个标记并检查该单词是否在正/负词典中。
最后,检查否。正面和负面的单词并返回标签。
现在让我们看看我们是否可以拥有更好的simple_sentiment
功能,现在我们知道demo_liu_hu_lexicon
做了什么。
无法避免步骤1中的标记化,因此我们有:
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
def simple_sentiment(text):
tokens = [word.lower() for word in word_tokenize(text)]
有一种懒惰的方法来做第2-5步就是复制+粘贴并更改print()
- &gt; return
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
def simple_sentiment(text):
tokens = [word.lower() for word in word_tokenize(text)]
for word in tokenized_sent:
if word in opinion_lexicon.positive():
pos_words += 1
y.append(1) # positive
elif word in opinion_lexicon.negative():
neg_words += 1
y.append(-1) # negative
else:
y.append(0) # neutral
if pos_words > neg_words:
return 'Positive'
elif pos_words < neg_words:
return 'Negative'
elif pos_words == neg_words:
return 'Neutral'
现在,你有一个功能,你可以做任何你想做的事。
顺便说一下,这个演示很奇怪..当我们看到正面单词添加1时,当我们看到否定时,我们会添加-1
。
我们说pos_words > neg_words
时有些事情是积极的。
这意味着整数列表比较遵循一些可能没有语言或数学逻辑的Pythonic序列比较=(参见What happens when we compare list of integers?)
答案 2 :(得分:0)
import sys
import io
from io import StringIO
stdout_ = sys.stdout
stream = StringIO()
sys.stdout = stream
demo_liu_hu_lexicon('PLACE YOUR TEXT HERE')
sys.stdout = stdout_
sentiment = stream.getvalue()
sentiment = sentiment[:-1]