Question

我是Python新手。我正在尝试编写一个浏览文件的脚本，并计算从www开始的所有字符串的唯一出现。

例如，假设我的文件有

www_1.youtube.com      
www_1.youtube.com      
www_3.google.com    
www_1.youtube.com

预期产出：

www_1.youtube.com - 3
www_3.google.com - 1

Answer 1

假设您的文件是名称file1。您可以使用字典将字符串作为键，并将计数存储为值。如果再次遇到相同的字符串，请增加该字符串的值。如果遇到新字符串，请将字符串作为新键添加到字典中，并将其值设置为1.这是一种方法。可能不是最好的。

import re
file1 = "abc.txt"

with open(file1) as f:
    content = f.read()

content = content.split('\n') #split content into lines

count = dict()
for c in content:
    if re.match('^www', c):  #check if string starts with 'www'
        if c in count:
            count[c] += 1  #update existing string key
        else:
            count[c] = 1   #add new string key

print count

输出：

{'www_1.youtube.com': 3, 'www_3.google.com': 1}

Answer 2

您可以在列表中获取您的文件内容，每行包含元素列表。然后使用startswith您可以过滤您的选择，使用colllections.Counter您可以轻松找到元素数量，这将是一个字典。

尝试一下：

import collections
with open("file.txt", 'r') as f:
    lines = f.readlines()
    print(collections.Counter([i.strip() for i in lines if i.startswith("www")]))

o / p将如下：

Counter({'www_1.youtube.com': 3, 'www_3.google.com': 1})

Answer 3

非常简单地将生成器理解提供给collections.Counter，计算第一个单词（在点上分割）：

import collections
with open("file.txt") as f:
   c=collections.Counter(l.split(".")[0] for l in f if l.startswith("www"))

结果：

Counter({'www_1': 3, 'www_3': 1})

计算以www开头的字符串的所有唯一实例

3 个答案: