我有一些包含可变列号的文本文件,由\t
(制表符)分隔。像这样:
value1x1 . . . . . . value1xn
. . . . . . . value2xn
. . . . . . . .
valuemx1 . . . . . . valuemxn
我可以使用以下代码扫描并确定值的频率;
f2 = open("out_freq.txt", 'w')
f = open("input_raw",'r')
whole_content = (f.read())
list_content = whole_content.split()
dict = {}
for one_word in list_content:
dict[one_word] = 0
for one_word in list_content:
dict[one_word] += 1
a = str(sorted(dict.items(),key=func))
f2.write(a)
f2.close()
并输出如下:
('26047', 13), ('42810', 13), ('61080', 13), ('106395', 13), ('102395', 13)...
这是('value', occurence_number)
的语法,它按预期工作。我想要实现的是:
按以下语法转换输出:('value', occurrence_number, column_number)
其中列号是input_raw.txt中出现此值的列号
将具有相同出现次数的值分组以分隔列并将这些值写入其他文件
答案 0 :(得分:0)
如果我理解你想要以下内容:
import itertools as it
from collections import Counter
with open("input_raw",'r') as fin, open("out_freq.txt", 'w') as fout:
counts = Counter(it.chain.from_iterable(enumerate(line.split())
for line in fin))
sorted_items = sorted(counts.items(), key=lambda x: x[1], reverse=True)
a = ', '.join(str((int(key[1]), val, key[0])) for key, val in sorted_items))
fout.write(a)
请注意,此代码使用元组作为键,以区分相等的值(如果它们出现在不同的列中)。从你的问题不清楚这是否可能以及在这种情况下应该做些什么。
使用示例:
>>> import itertools as it
>>> from collections import Counter
>>> def get_sorted_items(fileobj):
... counts = Counter(it.chain.from_iterable(enumerate(line.split()) for line in fileobj))
... return sorted(counts.items(), key=lambda x:x[1], reverse=True)
...
>>> data = """
... 10 11 12 13 14
... 10 9 7 6 4
... 9 8 12 13 0
... 10 21 33 6 1
... 9 9 7 13 14
... 1 21 7 13 0
... """
>>> with open('input.txt', 'wt') as fin: #write data to the input file
... fin.write(data)
...
>>> with open('input.txt', 'rt') as fin:
... print ', '.join(str((int(key[1]), val, key[0])) for key, val in get_sorted_items(fin))
...
(13, 4, 3), (10, 3, 0), (7, 3, 2), (14, 2, 4), (6, 2, 3), (9, 2, 0), (0, 2, 4), (9, 2, 1), (21, 2, 1), (12, 2, 2), (8, 1, 1), (1, 1, 4), (1, 1, 0), (33, 1, 2), (4, 1, 4), (11, 1, 1)