我从nltk使用pos_tag和word_tokenize获得了一串形容词。有7个列表:
positiverange4 = ['legendary', 'legend', 'finest', 'insane', 'best']
positiverange3 = ['favorite', 'favourite', 'fav', 'delicious', 'awesome', 'perfect', 'perfection', 'perfectly', 'scrumptous']
positiverange2 = ['love', 'courteous', 'great', 'generous', 'tasty', 'pleasent', 'polite']
positiverange1 = ['like', 'enjoyable', 'enjoy', 'reasonable', 'huge', 'plentiful', 'plenty', 'quick', 'enjoyed', 'fast', 'swift']
neutralrange = ['ok', 'fine', 'good', 'nice', 'gud', 'friendly', 'fresh', 'cheap']
negativerange1 = ['crowded', 'lousy', 'slow', 'bad']
我启动一个for循环,检查该字符串中的一个单词是否在这些列表中的任何一个,如果它退出,我就像这样递增计数器
count = 0
for w in adjectives:
if w in positiverange4:
val += 4
count = count + 1
elif w in positiverange3:
val += 3
count = count + 1
elif w in positiverange2:
val += 2
count = count + 1
elif w in positiverange1:
val += 1
count = count + 1
elif w in neutralrange:
val += 0
count = count + 1
elif w in negativerange1:
val -= 1
count = count + 1
elif w in negativerange2:
val -= 2
count = count + 1
elif w in negativerange3:
val -= 3
count = count + 1
elif w in negativerange4:
val -= 4
count = count + 1
print count
count的值多次出错。
答案 0 :(得分:2)
我和BATH IRSHAD在一起,规范化你的输入。还有你的参考数据(见下文)。此外,dict
的对于您的用例肯定是更好的数据结构set
离子可能
known_adj = {+4: {'legendary', 'legend', 'finest', 'insane', 'best'},
+3: {'favorite', 'favourite', 'fav', 'delicious', 'awesome',
'perfect', 'perfection', 'perfectly', 'scrumptous'},
... }
total_val = sum(val for val in known_adj for adj in adjectives
if adj.strip().lower() in known_adj[val])
如果您在匹配后跳过进一步的比较({strong>编辑: ),for
循环可以更有效率,并且还提供了计算匹配总数的简便方法OP的程序在循环中累积,这个细节让我大吃一惊......)
total_val = 0
# added in edit
total_matches = 0
for adj in adjectives:
adj = adj.strip().lower()
for val in known_adj:
if adj in known_adj[val]:
total_val += val
# added in edit
total_matches += 1
continue
您可能想要做的另一件事是清理 known_adj
from itertools import combinations
...
known_adj = update_ka()
for i, j in combinations(known_adj.keys(),2):
if known_adj[i].intersection(known_adj[j]):
# not an empty set, there is a repetition!
# print/log a warning, stop the machines, etc, you decide
答案 1 :(得分:0)
collections
模块>>> from collections import Counter
>>> # Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
... cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
<强>参考书目:强>
8.3。集合 - 高性能容器数据类型 - http://goo.gl/GGWYrW
9.7。 itertools - 为高效循环创建迭代器的函数 - http://goo.gl/GKfVXQ
Python列表http://goo.gl/HZ9Hm
在线演示http://repl.it/4NP
在线执行Python脚本http://goo.gl/4sxrD