Question

我正在尝试使用Loughran / McDonald字典来分类财务文本的基调。

这是我在网上找到的代码：

# Get tone dictionary

import re

with open('lmdict.txt') as list:
    lines = list.readlines()
dict = {}
for l in lines:
    if l[0:2] == '>>':
        cat = l[2:].strip()
        dict[cat] = []
    else:
        l = l.strip()
        if l:
            dict[cat].append(l)

# Set up regular expressions
regex = {}
for cat in dict.keys():
    pattern = '\\b(?:' + '|'.join(dict[cat]) + ')\\b'
    regex[cat] = re.compile(pattern, re.IGNORECASE)

# Get tone count
text = "Bsp.text"

wordcount = len(text.split())
for cat in count.keys():
    count[cat] = len(regex[cat].findall(text))
print(count)

之前发生的错误很少，所以我添加了 import re 和 text =“Bsp.text”来分配我想要归类为变量的文档< strong>文字（我希望我做得对吗？）。不幸的是，现在还有另一个错误：

Traceback (most recent call last):
  File "C:\Users\M\Desktop\Python34\xWordlist.py", line 25, in <module>
    for cat in count.keys():
NameError: name 'count' is not defined

我该如何解决这个问题？我是Python的新手，所以如果代码中有任何其他错误，请告诉我。我真的很感激！

更新：我更改了代码的最后一部分，现在正在运行：

# Get tone count

with open('Bsp.txt', 'r') as content_file:
    content = content_file.read()


count = {}
wordcount = len(content.split())
for cat in dict.keys():
    count[cat] = len(regex[cat].findall(content))

print(count)

Answer 1

永远不会分配您的计数变量... 也许你的意思是：

count = {}
for cat in dict.keys():
  ...

另外，我没有看到你的计数变量有任何增加。也许：

count[cat] = len(regex[cat].findall(text))

应该是：

if cat not in count:
  count[cat] = 0
count[cat] += len(regex[cat].findall(text))

我添加了＆＃39; +＆＃39;在＆＃39; =＆＃39;之前签...

注意：使用dict作为变量的名称并不是最好的事情可能导致意想不到的后果，充其量会让读者感到困惑。 dict是一个用于表示字典的内置类。

文本分类 - 错误：未定义'count'

1 个答案: