如果我有一个以数字开头的行文件,后跟一些文字,我怎么能看到这些数字是否总是跟着不同的文字?例如:
0 Brucella abortus Brucellaceae
0 Brucella ceti Brucellaceae
0 Brucella canis Brucellaceae
0 Brucella ceti Brucellaceae
所以在这里,我想知道0
之后是3种不同的“类型”文本。
理想情况下,我可以将文件读入一个python脚本,输出的内容如下:
1:250
2:98
3:78
4:65
etc.
第一个数字是不同“文本”的数量,而:
之后的数字将是这个数字发生的数量。
我有以下脚本来计算在不同数字中找到“文本”的次数,所以我想知道如何反转它所以我知道number
有多少次不同的文本,以及有多少不同的文本。此脚本将numbers
和“text”的文件放入字典中,但我不确定如何操作此字典以获得我想要的内容。
#!/usr/bin/env python
#Dictionary to broken species, genus, family
fileIn = 'usearchclusternumgenus.txt'
d = {}
with open(fileIn, "r") as f:
for line in f:
clu, gen, spec, fam = line.split()
d.setdefault(clu, []).append((spec))
# Iterate through and find out how many times each key occurs
vals = {} # A dictonary to store how often each value occurs.
for i in d.values():
for j in set(i): # Convert to a set to remove duplicates
vals[j] = 1 + vals.get(j,0) # If we've seen this value iterate the count
# Otherwise we get the default of 0 and iterate it
#print vals
# Iterate through each possible freqency and find how many values have that count.
counts = {} # A dictonary to store the final frequencies.
# We will iterate from 0 (which is a valid count) to the maximum count
for i in range(0,max(vals.values())+1):
# Find all values that have the current frequency, count them
#and add them to the frequency dictionary
counts[i] = len([x for x in vals.values() if x == i])
for key in sorted(counts.keys()):
if counts[key] > 0:
print key,":",counts[key]`
答案 0 :(得分:2)
使用collections.defaultdict()
object作为工厂设置来跟踪不同的行,然后打印出所收集集的大小:
from collections import defaultdict
unique_clu = defaultdict(set)
with open(fileIn) as infh:
for line in infh:
clu, gen, spec, rest = line.split(None, 3)
unique_clu[clu].add(spec)
for key in sorted(unique_clu):
count = len(unique_clu[key])
if count:
print '{}:{}'.format(key, count)