错误消息' TypeError:并非在字符串格式化期间转换所有参数'

时间:2015-02-09 06:41:27

标签: python

我正在查看一组文档,确定最常见的20个单词是什么,然后如果每个单词都在文档中,则返回。问题是我得到了错误" TypeError:并非在字符串格式化期间转换了所有参数"为行

features['contains(%s)' %word] = (word in document_words)

我必须遗漏一些东西,但我无法弄清楚它是什么。这是我正在努力的功能:

import nltk, random, re
from nltk.corpus import movie_reviews

def top20_document_features(document):
    document_words = set(document)
    all_words = {}
    sorted_words = {}
    top_20_words = []
    all_words = nltk.FreqDist(word.lower() for word in movie_reviews.words() if re.match('^[a-z]+$',word))
    sorted_words = (sorted(all_words.items(),key=lambda x: x[1], reverse=True))
    top_20_words = all_words.most_common(20)
    features = {}
    for word in top_20_words:
        features['contains(%s)' %word] = (word in document_words)
    return features

当我打印top_20_words时,我得到了

[('the', 76529), ('a', 38106), ('and', 35576), ('of', 34123), ('to', 31937), ('is', 25195), ('in', 21822), ('s', 18513), ('it', 16107), ('that', 15924), ('as', 11378), ('with', 10792), ('for', 9961), ('his', 9587), ('this', 9578), ('film', 9517), ('i', 8889), ('he', 8864), ('but', 8634), ('on', 7385)]

当我打印document_words时,我得到了

{'rushed', 'tv', 'including', 'too', 'them', 'big', 'every', 'only', 'collision', 'sarcasm', 'few', 'would', 'dispensed', 'while', 'his', 'exciting', 'work', 'hollywood', 'any', 'more', 'impressive', 'mast', 'they', 'she', 'theft', 'nuclear', 'team', 'revenge', 'doubt', 'sequences', 'he', 'greene', 'peacemaker', 'madman', 'lost', 'helicopter', 'better', 'tried', 'most', 'walking', 'there', 'for', 'sarcastic', 'clooney', 'true', 'good', 'intelligent', 'in', 'brilliant', 'through', 'best', 'you', 'amazing', 'this', 'expect', 'little', 'very', 'new', 'off', 'head', 'as', ',', 'final', 'obviously', 'with', 'ingratiating', 'despite', 'nuke', 'along', 'back', 'is', 'colonel', 'when', 'about', 'labour', 'nice', 'moved', 'll', 'probably', 'flirtatious', 'around', 'adrenaline', 'images', 'come', 'mark', 'love', 'story', 'drawn', 'force', 'way', 'world', 'shaky', 'me', 'smuggling', 'may', 'appetizer', 'take', 'tension', 'throughout', 're', 'have', 'petersen', 'him', 'over', 'wit', 'between', 'be', 'show', 'figure', 'streets', 'none', 'george', 'once', 'coherent', 'stunning', 'director', "'", 'creating', 'presence', 'after', '.', 'real', 'reminds', 'her', 'satisfying', 'was', 'movies', 'can', 'not', 'stuff', 'rough', 'it', 'nothing', 'half', '"', 'nicole', 'created', 'connections', 'first', 'child', 'leads', 'then', 'suspense', 'falling', 'down', 'say', 'on', 'ordeal', 'and', 'get', 'by', 'house', 'doctor', 'kidman', 'edges', 'nifty', 'doing', 'action', 'find', 'many', 'blockbuster', 'who', 'where', 'leder', 'into', 'across', 'york', 'guy', 'of', 'are', 'frustration', 'starts', 'natural', 'two', 'beginning', 'does', 'banter', 'car', 'star', 'female', 'which', 'city', 'make', 'implausible', 'going', 'trying', 'at', 'determination', 'camera', 'quickly', 'gonna', 'terrorists', 'globe', 'pregnant', 'usual', 'air', 'us', 'start', 'heroics', 'i', 'direct', 'characters', 'routine', 'chase', 'movie', 'an', 'save', 'known', 'just', 'a', 'much', 'settled', 'takes', 'against', 'screen', 'mother', 'but', 'crowded', 'hero', 'out', 'mimi', '-', 'explosion', 'weapons', 'that', 'bouts', 'among', 's', 'breathtaking', 'personality', 'from', 'wolfgang', 'life', 'hit', 'batman', 'own', 'famous', 'train', 'ability', 'to', 'disappoint', 'enter', 'goes', 'detail', 'excitement', 'begins', 'the', 'we', 'than', 'works', 'hated', 'how', 'dealt', 'self', 'long', 'has', 'insecurity', 'stolen', 'episode', 'scene', 'er', 'white', 'deals', 'all', 'dreamworks', 'so', 'hour', 'showdown', 'picture'}

1 个答案:

答案 0 :(得分:0)

word是单词和计数的元组。在for循环中使用word[0](两个地方)。