TL; DR

Question

我使用的是NLTK软件包，它有一个函数可以告诉我某个句子是正面，负面还是中立：

from nltk.sentiment.util import demo_liu_hu_lexicon

demo_liu_hu_lexicon('Today is a an awesome, happy day')
>>> Positive

问题是，该功能没有返回声明 - 它只是打印＆＃34;肯定＆＃34;，＆＃34;否定＆＃34;或＆＃34;中立＆＃34;到stdout。所有它返回 - 隐含地 - 是一个NoneType对象。（Advanced Custom Fields＆＃39; s函数的源代码。）

有什么方法可以捕获这个输出（除了在我的机器上弄乱NLTK源代码）？

Answer 1

import sys
from io import StringIO

class capt_stdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    @property
    def string(self):
        return self._string_io.getvalue()

像这样使用：

with capt_stdout() as out:
    demo_liu_hu_lexicon('Today is a an awesome, happy day')
    demo_liu_hu_lexicon_output = out.string

Answer 2

TL; DR

demo_liu_hu_lexicon函数是如何使用opinion_lexicon的演示函数。它用于测试，不应直接使用。

在长

让我们看一下这个函数，看看我们如何重新创建一个类似的函数https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616

def demo_liu_hu_lexicon(sentence, plot=False):
    """
    Basic example of sentiment classification using Liu and Hu opinion lexicon.
    This function simply counts the number of positive, negative and neutral words
    in the sentence and classifies it depending on which polarity is more represented.
    Words that do not appear in the lexicon are considered as neutral.
    :param sentence: a sentence whose polarity has to be classified.
    :param plot: if True, plot a visual representation of the sentence polarity.
    """
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()

好的，这对于函数内部存在的导入来说是一种奇怪的用途，但这是因为它是用于简单测试或文档的演示函数。

此外，treebank.TreebankWordTokenizer()的使用相当奇怪，我们可以简单地使用nltk.word_tokenize。

让我们移出导入并将demo_liu_hu_lexicon重写为simple_sentiment函数。

from nltk.corpus import opinion_lexicon
from nltk import word_tokenize

def simple_sentiment(text):
    pass

接下来，我们看到

def demo_liu_hu_lexicon(sentence, plot=False):
    """
    Basic example of sentiment classification using Liu and Hu opinion lexicon.
    This function simply counts the number of positive, negative and neutral words
    in the sentence and classifies it depending on which polarity is more represented.
    Words that do not appear in the lexicon are considered as neutral.
    :param sentence: a sentence whose polarity has to be classified.
    :param plot: if True, plot a visual representation of the sentence polarity.
    """
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

功能

首先将句子标记化并降低句子
初始化正面和负面的数字。
x和y稍后会进行初始化，所以让我们忽略它。

如果我们进一步了解这个功能：

def demo_liu_hu_lexicon(sentence, plot=False):
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        print('Positive')
    elif pos_words < neg_words:
        print('Negative')
    elif pos_words == neg_words:
        print('Neutral')

循环只是遍历每个标记并检查该单词是否在正/负词典中。
最后，检查否。正面和负面的单词并返回标签。

现在让我们看看我们是否可以拥有更好的simple_sentiment功能，现在我们知道demo_liu_hu_lexicon做了什么。

无法避免步骤1中的标记化，因此我们有：

from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def simple_sentiment(text):
    tokens = [word.lower() for word in word_tokenize(text)]

有一种懒惰的方法来做第2-5步就是复制+粘贴并更改print() - ＆gt; return

from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def simple_sentiment(text):
    tokens = [word.lower() for word in word_tokenize(text)]

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        return 'Positive'
    elif pos_words < neg_words:
        return 'Negative'
    elif pos_words == neg_words:
        return 'Neutral'

现在，你有一个功能，你可以做任何你想做的事。

当我们看到正面单词添加1时，当我们看到否定时，我们会添加-1。我们说pos_words > neg_words时有些事情是积极的。

这意味着整数列表比较遵循一些可能没有语言或数学逻辑的Pythonic序列比较=（参见What happens when we compare list of integers?）

Answer 3

import sys
import io
from io import StringIO

stdout_ = sys.stdout
stream = StringIO()
sys.stdout = stream
demo_liu_hu_lexicon('PLACE YOUR TEXT HERE') 
sys.stdout = stdout_ 
sentiment = stream.getvalue()     
sentiment = sentiment[:-1]

必须捕获没有return语句的函数的输出

3 个答案:

TL; DR

在长