我有一个包含2列数据的输入文件。我需要合并两列并删除重复。任何建议如何开始?谢谢 !
输入文件
5045 2317
5045 1670
5045 2156
5045 1509
5045 3833
5045 1013
5045 3491
5045 32
5045 1482
5045 2495
5045 4280
5045 1380
5045 3998
预期输出
5045
2317
1670
2156
1509
3833
1013
3491
32
1482
2495
4280
1380
3998
答案 0 :(得分:1)
set1 = set()
set2 = set()
for line in myfile:
a,b = line.strip().split()
set1.add(int(a))
set2.add(int(b))
set1.update(set2)
然后将set1的内容写入文件。
答案 1 :(得分:0)
我假设输出中行的顺序很重要。下面代码的输出将与您想要的输出完全匹配(例如,与使用set
s的答案不同):
In [1]: with open("file.txt") as f, open("output.txt", "w") as out:
...: arrs = [ l.rstrip().split() for l in f ]
...: vals = [ a for arr in arrs for a in arr ] # merge columns
...: # restrict to first occurrence of each value (i.e. remove duplicates)
...: uniqueVals = [ v for i, v in enumerate(vals) if vals.index(v) == i ]
...: out.write("\n".join(uniqueVals))
这会将"file.txt"
的输入输出到"output.txt"
,然后输出:
答案 2 :(得分:0)
>>> import numpy as np
>>> a=np.loadtxt('file_name',delimiter=' ')
>>> a=a.flatten()
>>> a=list(set(a))
>>> a
[32.0, 3491.0, 1380.0, 1509.0, 1670.0, 1482.0, 2156.0, 2317.0, 5045.0, 4280.0, 3833.0, 2495.0, 3998.0, 1013.0]
答案 3 :(得分:0)
保持订单:
from itertools import chain
with open("in.txt") as f:
lines = list(chain.from_iterable(x.split() for x in f))
with open("in.txt","w") as f1:
for ind, line in enumerate(lines,1):
if not line in lines[:ind-1]:
f1.write(line+"\n")
输出:
5045
2317
1670
2156
1509
3833
1013
3491
32
1482
2495
4280
1380
3998
如果订单无关紧要:
from itertools import chain
with open("in.txt") as f:
lines = set(chain.from_iterable(x.split() for x in f))
with open("in.txt","w") as f1:
f1.writelines("\n".join(lines))
如果第一列中只重复了一个数字:
with open("in.txt") as f:
col_1 = f.next().split()[0] # get first column number
lines = set(x.split()[1] for x in f) # get all second column nums
lines.add(col_1) # add first column num
with open("in.txt","w") as f1:
f1.writelines("\n".join(lines))