我有一个文本文件,其行如下:
"aa aa bb aa"
"cc cc dd bb bb"
想要删除重复令牌以获得类似的文件:
"aa bb"
"cc dd bb"
答案 0 :(得分:2)
with open("datafile") as fin, open("outfile","w") as fout:
for line in fin:
print >> fout, ' '.join(set(line.split()))
with open("datafile") as fin, open("outfile","w") as fout:
for line in fin:
print(*(set(line.split()), file=fout)
答案 1 :(得分:1)
在python中:
s = "aa aa bb aa"
' '.join(set(s.split()))
输出:
'aa bb'
如果订单很重要,请尝试:
lst = []
[lst.append(i) for i in s.split() if i not in lst]
' '.join(lst)
答案 2 :(得分:0)
下面。虽然这有点复杂,但它会维持秩序。
>>> for e in s.split():
c = set(e)
for i in c:
print(i)
a
a
b
a
将其放入您的文件内上下文中:
with open('datafile') as fin, open('outfile') as fout:
for e in s.split():
c = set(e)
for i in c:
print(i, end=' ' outfile=fout)
#print >> fout, i #Python 2.x
答案 3 :(得分:-1)
这样的事情:
from sets import Set
lines = ['aa aa bb aa','cc cc dd bb bb']
for l in lines:
s = Set()
for word in l.split():
s.add(word)
print ' '.join(s)