Question

[使用Python 3.3.3]

我试图分析文本文件，清理它们，打印大量的独特单词，然后尝试将唯一单词列表保存到文本文件中，每行一个单词，每个唯一单词的次数出现在清理过的单词列表中。所以我做的是我拿了文本文件（总理哈珀的演讲），只通过计算有效的字母字符和单个空格来清理它，然后我计算了唯一字的数量，然后我需要制作一个保存的文本文件每个唯一单词在其自己的行上和单词旁边的单词中，该单词在清理列表中的出现次数。这就是我所拥有的。

def uniqueFrequency(newWords):
    '''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
    unique = sorted(set(newWords.split()))
    for i in unique:
        unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
    return unique

def saveUniqueList(uniqueLines, filename):
    '''Function saves result of uniqueFrequency into a text file.'''
    outFile = open(filename, "w")
    outFile.write(uniqueLines)
    outFile.close

newWords是文本文件的清理版本，只有单词和空格，没有别的。因此，我希望将newWords文件中的每个唯一单词保存到文本文件中，每行一个单词，并在单词旁边，在newWords中显示该单词的出现次数（不是在单词列表中，因为每个单词都会有1次出现）。我的职能有什么问题？谢谢！

Answer 1

unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"

上面的行，在现有集合的末尾附加 - “unique”，如果你使用其他变量名称，比如“var”，它应该正确返回。

def uniqueFrequency(newWords):
    '''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
    var = "";
    unique = sorted(set(newWords.split()))
    for i in unique:
        var = str(var) + i + " " + str(newWords.count(i)) + "\n"
    return var

Answer 2

基于

unique = sorted(set(newWords.split()))
for i in unique:
    unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"

我猜测newWords不是字符串列表而是长字符串。如果是这种情况，newWords.count(i)将为每0返回i。

尝试：

def uniqueFrequency(newWords):
    '''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
    wordList = newWords.split()
    unique = sorted(set(wordList))
    ret = ""
    for i in unique:
        ret = ret + i + " " + str(wordList.count(i)) + "\n"
    return ret

Answer 3

请尝试使用collections.Counter。它适用于这种情况。

以下IPython示范：

In [1]: from collections import Counter

In [2]: txt = """I'm trying to analyse text files, clean them up, print the amount of unique words, then try to save the unique words list to a text file, one word per line with the amount of times each unique word appears in the cleaned up list of words. SO what i did was i took the text file (a speech from prime minister harper), cleaned it up by only counting valid alphabetical characters and single spaces, then i counted the amount of unique words, then i needed to make a saved text file of the unique words, with each unique word being on its own line and beside the word, the number of occurances of that word in the cleaned up list. Here's what i have."""

In [3]: Counter(txt.split())
Out[3]: Counter({'the': 10, 'of': 7, 'unique': 6, 'i': 5, 'to': 4, 'text': 4, 'word': 4, 'then': 3, 'cleaned': 3, 'up': 3, 'amount': 3, 'words,': 3, 'a': 2, 'with': 2, 'file': 2, 'in': 2, 'line': 2, 'list': 2, 'and': 2, 'each': 2, 'what': 2, 'did': 1, 'took': 1, 'from': 1, 'words.': 1, '(a': 1, 'only': 1, 'harper),': 1, 'was': 1, 'analyse': 1, 'one': 1, 'number': 1, 'them': 1, 'appears': 1, 'it': 1, 'have.': 1, 'characters': 1, 'counted': 1, 'list.': 1, 'its': 1, "I'm": 1, 'own': 1, 'by': 1, 'save': 1, 'spaces,': 1, 'being': 1, 'clean': 1, 'occurances': 1, 'alphabetical': 1, 'files,': 1, 'counting': 1, 'needed': 1, 'that': 1, 'make': 1, "Here's": 1, 'times': 1, 'print': 1, 'up,': 1, 'beside': 1, 'trying': 1, 'on': 1, 'try': 1, 'valid': 1, 'per': 1, 'minister': 1, 'file,': 1, 'saved': 1, 'single': 1, 'words': 1, 'SO': 1, 'prime': 1, 'speech': 1, 'word,': 1})

（请注意，这个解决方案还不完善;它没有从单词中删除逗号。提示;使用str.replace。）

Counter是一个专门的dict，以单词作为键，计数作为值。所以你可以像这样使用它：

 cnts = Counter(txt)
 with open('counts.txt', 'w') as outfile:
     for c in counts:
         outfile.write("{} {}\n".format(c, cnts[c]))

请注意，在此解决方案中，我使用了一些很好的Python概念;

a context manager
迭代dict（iterator）
str.format

唯一单词将每行保存为文本文件

3 个答案: