对文本文件中的单词进行排序和计数

时间:2016-11-12 02:33:55

标签: python python-3.x sorting count

我是编程的新手,并坚持我当前的程序。我必须从文件中读取故事,对单词进行排序,并计算每个单词的出现次数。它将对单词进行计数,但不会对单词进行排序,删除标点符号或复制单词。我迷失了为什么它不起作用。任何意见将是有益的。

ifile = open("Story.txt",'r')
fileout = open("WordsKAI.txt",'w')
lines = ifile.readlines()

wordlist = []
countlist = []

for line in lines:
    wordlist.append(line)
    line = line.split()
    # line.lower()

    for word in line:
        word = word.strip(". ,  ! ? :  ")
        # word = list(word)
        wordlist.sort()
        sorted(wordlist)
        countlist.append(word)

        print(word,countlist.count(word))

3 个答案:

答案 0 :(得分:1)

您的代码中的主要问题是在第(第9行):

    wordlist.append(line)

您将整行附加到wordlist,我怀疑这是您想要的。执行此操作时,添加的字词在.strip()添加到wordlist之前不会strip()

您需要做的只是在ifile = open("Story.txt",'r') lines = ifile.readlines() wordlist = [] countlist = [] for line in lines: # Get all the words in the current line words = line.split() for word in words: # Perform whatever manipulation to the word here # Remove any punctuation from the word word = word.strip(".,!?:;'\"") # Make the word lowercase word = word.lower() # Add the word into wordlist only if it is not in wordlist if word not in wordlist: wordlist.append(word) # Add the word to countlist so that it can be counted later countlist.append(word) # Sort the wordlist wordlist.sort() # Print the wordlist for word in wordlist: print(word, countlist.count(word)) 编辑后添加单词,并确保在检查没有其他相同单词(没有重复项)后才这样做:

ifile = open("Story.txt", "r")
lines = ifile.readlines()

word_dict = {}

for line in lines:
    # Get all the words in the current line
    words = line.split()
    for word in words:
        # Perform whatever manipulation to the word here
        # Remove any punctuation from the word
        word = word.strip(".,!?:;'\"")
        # Make the word lowercase
        word = word.lower()

        # Add the word to word_dict
        word_dict[word] = word_dict.get(word, 0) + 1

# Create a wordlist to display the words sorted
word_list = list(word_dict.keys())
word_list.sort()

for word in word_list:
    print(word, word_dict[word])

你可以这样做的另一种方法是使用字典,将单词存储为键,将出现次数存储为值:

var req={
            url: 'url',
            method: "POST",
             headers: {
                     'Content-Type': 'multipart/form-data', 
                     'Accept': 'text/plain'
                    },
            data: {
                username: "syam",
                password:"syamnath123"
                }
        }

答案 1 :(得分:0)

您必须为排序方法提供关键功能。 试试这个     r = sorted(wordlist, key=str.lower)

答案 2 :(得分:0)

punctuation = ".,!?: "
counts = {}
with open("Story.txt",'r') as infile:
    for line in infile:
        for word in line.split():
            for p in punctuation:
                word = word.strip(p)
            if word not in counts:
                counts[word] = 0
            counts[word] += 1

with open("WordsKAI.txt",'w') as outfile:
    for word in sorted(counts):  # if you want to sort by counts instead, use sorted(counts, key=counts.get)
        outfile.write("{}: {}\n".format(word, counts[word]))