Question

我最近一直在阅读nltk文件。我不明白以下代码。

def dialogue_act_features(post):
    features = {}
    for word in nltk.word_tokenize(post):
        features['contains(%s)' % word.lower()] = True
    return features

这是NaiveBayesClassifier的特征提取器，但

是什么

features['contains(%s)' % word.lower()] = True

意思？

我认为这行代码是一种生成字典的方法，但我不知道它是如何工作的。

由于

Answer 1

说word='ABCxyz'，

word.lower() ---＆gt;将它转换为小写，以便返回abcxyz'

'contains(%s)' % word.lower() ---＆gt;将格式化字符串并将%s替换为word.lower()的值并返回'contains(abcxyz)'

features['contains(%s)' % word.lower()] = True ---＆gt;会在功能字典中创建一个键值对，键为'contains(abcxyz)'，值为True

因此，

features = {}
features['contains(%s)' % word.lower()] = True

会创建

features = {'contains(abcxyz)':True}

Answer 2

在此代码中：

>>> import nltk
>>> def word_features(sentence):
...     features = {}
...     for word in nltk.word_tokenize(sentence):
...         features['contains(%s)' % word.lower()] = True
...     return features
...     
...    
... 
>>> sent = 'This a foobar word extractor function'
>>> word_features(sent)
{'contains(a)': True, 'contains(word)': True, 'contains(this)': True, 'contains(function)': True, 'contains(extractor)': True, 'contains(foobar)': True}
>>>

此行正在尝试填充/填写功能字典：

features['contains(%s)' % word.lower()] = True

以下是python中字典的一个简单示例（有关详细信息，请参阅https://docs.python.org/2/tutorial/datastructures.html#dictionaries）：

>>> adict = {}
>>> adict['key'] = 'value'
>>> adict['key']
'value'
>>> adict['apple'] = 'red'
>>> adict['apple']
'red'
>>> adict
{'apple': 'red', 'key': 'value'}

word.lower()小写一个字符串，例如

>>> str = 'Apple'
>>> str.lower()
'apple'
>>> str = 'APPLE'
>>> str.lower()
'apple'
>>> str = 'AppLe'
>>> str.lower()
'apple'

当你执行'contains(%s)' % word时，它会尝试创建字符串contain(和符号运算符，然后创建)。符号运算符将被分配到字符串之外，例如

>>> a = 'apple'
>>> o = 'orange'
>>> '%s' % a
'apple'
>>> '%s and' % a
'apple and'
>>> '%s and %s' % (a,o)
'apple and orange'

符号运算符类似于str.format()函数，例如

>>> a = 'apple'
>>> o = 'orange'
>>> '%s and %s' % (a,o)
'apple and orange'
>>> '{} and {}'.format(a,o)
'apple and orange'

因此，当代码'contains(%s)' % word时，它实际上正在尝试生成这样的字符串：

>>> 'contains(%s)' % a
'contains(apple)'

当您将该字符串作为密钥放入字典时，您的密钥将如下所示：

>>> adict = {}
>>> key1 = 'contains(%s)' % a
>>> value1 = True
>>> adict[key1] = value1
>>> adict
{'contains(apple)': True}
>>> key2 = 'contains(%s)' % o
>>> value = 'orange'
>>> value2 = False
>>> adict[key2] = value2
>>> adict
{'contains(orange)': False, 'contains(apple)': True}

有关详细信息，请参阅

什么是'功能[＆＃39;包含（％s）＆＃39; ％word.lower（）] =真实的意思是NLTK？

2 个答案: