Question

所以我的目标是用推文打开这个推特文件，并按频率订购主题标签来讲述趋势主题，我之前已经问过这个问题但是我已经改变了我的代码并且达到了打印主题标签的程度金额，但我如何订购并将其发送到另一个名为trending.txt的文件

counts ={}
with open("/Users/Adnan/Desktop/twitter_data.txt") as data:
    for tag in data:
        for line in data:
            for part in line.capitalize().split():
                if "#" in part:
                    counts[part] = counts.get(part,0) + 1

for w in counts:
    print((w+','+str(counts[w])+'/n'))

Answer 1

使用collections.Counter() object而不是字典;它是一个专门的字典，包含您想要的开箱即用功能：

from collections import Counter

counts = Counter()
with open("/Users/Adnan/Desktop/twitter_data.txt") as data:
    for tag in data:
        for line in data:
            for part in line.capitalize().split():
                if "#" in part:
                    counts[part] += 1

with open('trending.txt') as trending:
    for hashtag, count in counts.most_common():
        print(hashtag, count, sep=',', file=trending)

Counter.most_common() method按排序顺序生成(key, count)值，从最常见到最少。您可以通过传入整数来限制返回的条目数：

with open('trending.txt') as trending:
    # The 10 most popular hashtags
    for hashtag, count in counts.most_common(10):
        print(hashtag, count, sep=',', file=trending)

请注意，您的for tag in data只会迭代一次;它会读取第一行，然后for line in data:处理文件的其余部分。您可以使用next(data, None)代替该循环：

with open("/Users/Adnan/Desktop/twitter_data.txt") as data:
    tag = next(data, None)  # read the first line
    for line in data:
        for part in line.capitalize().split():
            if "#" in part:
                counts[part] += 1

最后但同样重要的是，如果您尝试生成CSV文件（以逗号分隔的数据），请使用csv module：

import csv

with open('trending.txt') as trending:
    writer = csv.csvwriter(trending)
    writer.writerows(counts.most_common())

以上按排序顺序将所有计数写入CSV文件。

Answer 2

使用Counter dict和most_common使用csv lib将数据写入outfile：

from collections import Counter
import csv


with open("/Users/Adnan/Desktop/twitter_data.txt") as data, open("trending.txt") as out:
    wr = csv.writer(out)
    counts = Counter(part for tag in map(str.capitalize, data)
                     for part in data.split()
                         if "#" in part)
    wr.writerows(counts.most_common())

使用map(str.capitalize, data)会在所有行上映射str.capitalize，这比在循环中重复调用更有效，writerows会迭代迭代次数，因此会写回tag, count个元组返回从most_common到你的outfile的每一行。

如何按频率从最高到最低排序

2 个答案: