Question

我在获取规范化字典方面遇到困难。在我的字典中，我有一堆我们要在文本文件中计算的单词。现在，对于这些单词/字符中的每一个，在我的项目中，“规范化”就是将它们的频率/值除以给定文本中句子的总数。然后，我必须用这些新值替换字典的旧值。

即我的字典的名称是count，具有这样的键和值：

{'and': 5, ';' : 3, '-' : 0...}

def main(textfile, normalize == True):
    .
    .
    .
    .
    if normalize == True:
        for x in count:
            new_count[x] = count[x]/numSentence
            print(x,count[x])

以下是一个示例文件，可尝试在以下任何代码上使用：https://www.dropbox.com/s/7xph5pb9bdf551h/sample2.txt?dl=0 还请注意，在上面的代码中normalize == True存在，因为在顶层函数中

Answer 1

下面的代码为您提供了一个搜索字符串中的单词的示例，例如"remember me"有两个"me"匹配项，其中一个在单词“ remember”中，另一个在“ me”中，但只有一个其中有一个单词示例：

"remember me".count('me') # output: 2
'me' in 'remember me' == 2  # True

仅匹配整个单词

'me' in 'remember me'.split() == 1 # True

因此，如果我在这里正确理解了您的问题，则需要将整个单词匹配：

mydict = {'and': 5, ';' : 3, '-' : 0} 
text = 'hello and me; in mem;ory ; me-ome _ -'

# find a word frequency in a text
def count(word, text):
    return len([w for w in text.split() if w == word])

# update dictionary with new count
mydict = {key:count(key, text) for key in mydict}
print(mydict)

输出：

{'and': 1, ';': 0, '-': 1}

计算文本文件中每个句子的单词和每个段落的句子

1 个答案: