我有一个带有两列的制表符分隔文件文件。我需要找到一种方法,将所有相互“击中”的值打印到一行。
例如,我的输入如下:
A B
A C
A D
B C
B D
C D
B E
D E
B F
C F
F G
F H
H I
K L
我想要的输出应该如下所示:
A B C D
B D E
B C F
F G H
H I
K L
我的实际数据文件比这大得多,如果这有任何区别的话。我希望尽可能在Unix或Python中这样做。
有人可以帮忙吗?
提前致谢!
答案 0 :(得分:1)
无法将输入文件设为.csv
?解析分隔符会更容易。
如果它不可能,请尝试下一个例子:
from itertools import groupby
from operator import itemgetter
with open('example.txt','rb') as txtfile:
cleaned = []
#store file information in a list of lists
for line in txtfile.readlines():
cleaned.append(line.split())
#group py first element of nested list
for elt, items in groupby(cleaned, itemgetter(0)):
row = [elt]
for item in items:
row.append(item[1])
print row
希望它对你有所帮助。
使用.csv
文件的解决方案:
from itertools import groupby
from operator import itemgetter
import csv
with open('example.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
for row in reader:
cleaned.append(row) #group py first element of nested list
for elt, items in groupby(cleaned, itemgetter(0)):
row = [elt]
for item in items:
row.append(item[1])
print row