使用单个字符串替换文件中的重复字符串

时间:2013-12-02 05:52:10

标签: python

我有一个文本文件,其行如下:

"aa aa bb aa"
"cc cc dd bb bb"

想要删除重复令牌以获得类似的文件:

"aa bb"
"cc dd bb"

4 个答案:

答案 0 :(得分:2)

在Python2.7

with open("datafile") as fin, open("outfile","w") as fout:
    for line in fin:
        print >> fout, ' '.join(set(line.split()))

在Python3.x中

with open("datafile") as fin, open("outfile","w") as fout:
    for line in fin:
        print(*(set(line.split()), file=fout)

答案 1 :(得分:1)

在python中:

s = "aa aa bb aa"
' '.join(set(s.split()))

输出:

'aa bb'

如果订单很重要,请尝试:

lst = []
[lst.append(i) for i in s.split() if i not in lst]
' '.join(lst)

答案 2 :(得分:0)

下面。虽然这有点复杂,但它会维持秩序。

>>> for e in s.split():
        c = set(e)
        for i in c:
            print(i)        
a
a
b
a

将其放入您的文件内上下文中:

with open('datafile') as fin, open('outfile') as fout:
    for e in s.split():
        c = set(e)
        for i in c:
            print(i, end=' ' outfile=fout)
                    #print >> fout, i #Python 2.x

答案 3 :(得分:-1)

这样的事情:

from sets import Set
lines = ['aa aa bb aa','cc cc dd bb bb']
for l in lines:
    s = Set()
    for word in l.split():
        s.add(word)
    print ' '.join(s)