使用`collections`模块

Question

我从nltk使用pos_tag和word_tokenize获得了一串形容词。有7个列表：

positiverange4 = ['legendary', 'legend', 'finest', 'insane', 'best']    
positiverange3 = ['favorite', 'favourite', 'fav', 'delicious', 'awesome', 'perfect', 'perfection', 'perfectly', 'scrumptous']    
positiverange2 = ['love', 'courteous', 'great', 'generous', 'tasty', 'pleasent', 'polite']    
positiverange1 = ['like', 'enjoyable', 'enjoy', 'reasonable', 'huge', 'plentiful', 'plenty', 'quick', 'enjoyed', 'fast', 'swift']
neutralrange   = ['ok', 'fine', 'good', 'nice', 'gud', 'friendly', 'fresh', 'cheap']
negativerange1 = ['crowded', 'lousy', 'slow', 'bad']

我启动一个for循环，检查该字符串中的一个单词是否在这些列表中的任何一个，如果它退出，我就像这样递增计数器

count = 0
for w in adjectives:
    if w in positiverange4:
        val += 4 
        count = count + 1
    elif w in positiverange3:
        val += 3
        count = count + 1
    elif w in positiverange2:
        val += 2
        count = count + 1
    elif w in positiverange1:
        val += 1
        count = count + 1
    elif w in neutralrange:
        val += 0
        count = count + 1
    elif w in negativerange1:
        val -= 1
        count = count + 1
    elif w in negativerange2:
        val -= 2
        count = count + 1
    elif w in negativerange3:
        val -= 3
        count = count + 1   
    elif w in negativerange4:
        val -= 4
        count = count + 1                               
print count

count的值多次出错。

Answer 1

我和BATH IRSHAD在一起，规范化你的输入。还有你的参考数据（见下文）。此外，dict ~~的set离子可能~~对于您的用例肯定是更好的数据结构

known_adj = {+4: {'legendary', 'legend', 'finest', 'insane', 'best'},
             +3: {'favorite', 'favourite', 'fav', 'delicious', 'awesome',
                  'perfect', 'perfection', 'perfectly', 'scrumptous'},
             ... }

total_val = sum(val for val in known_adj for adj in adjectives
                             if adj.strip().lower() in known_adj[val])

如果您在匹配后跳过进一步的比较（{strong>编辑： ），for循环可以更有效率，并且还提供了计算匹配总数的简便方法OP的程序在循环中累积，这个细节让我大吃一惊......）

total_val = 0
# added in edit
total_matches = 0
for adj in adjectives:
    adj = adj.strip().lower()
    for val in known_adj:
        if adj in known_adj[val]:
             total_val += val
             # added in edit
             total_matches += 1
             continue

您可能想要做的另一件事是清理 known_adj

 from itertools import combinations
 ...
 known_adj = update_ka()
 for i, j in combinations(known_adj.keys(),2):
     if known_adj[i].intersection(known_adj[j]):
         # not an empty set, there is a repetition!
         # print/log a warning, stop the machines, etc, you decide

Answer 2

使用`collections`模块

>>> from collections import Counter
>>> # Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

<强>参考书目：
8.3。集合 - 高性能容器数据类型 - http://goo.gl/GGWYrW
9.7。 itertools - 为高效循环创建迭代器的函数 - http://goo.gl/GKfVXQ
Python列表http://goo.gl/HZ9Hm
在线演示http://repl.it/4NP
在线执行Python脚本http://goo.gl/4sxrD

当在字符串和列表之间应用时，for循环不会产生具体的答案

2 个答案:

使用`collections`模块

当在字符串和列表之间应用时，for循环不会产生具体的答案

2 个答案:

使用collections模块

使用`collections`模块