如何删除文本文件中的重复链接?

时间:2016-08-18 11:21:18

标签: python parsing hyperlink duplicates text-parsing

所以我有一个文本文件,里面看起来像这样:

http://example.pl/folder/this_same1.avi
http://example.pl/folder/this_same1.avi
http://example.pl/folder/this_same2.avi
http://example.pl/folder/this_same2.avi
http://example.pl/folder/this_same3.avi
http://example.pl/folder/this_same3.avi

我想删除所有重复的链接。 输出文件如下所示:

http://example.pl/folder/this_same1.avi
http://example.pl/folder/this_same2.avi
http://example.pl/folder/this_same3.avi

2 个答案:

答案 0 :(得分:1)

哦,我已经改进了我的答案:

links = set()
with open('file.txt', 'r') as fp:
    for line in fp.readlines():
        links.add(line)

然后你可以回写文件:

with open('file.txt', 'wb') as fp:
    for line in links:
        fp.write(line)

自己测试..

答案 1 :(得分:0)

如果结构一致且订单很重要:

links = fp.readlines()[::2]

如果结构不一致,订单很重要:

links = []
for line in fp.readlines():
    if line not in links:
        links.append(line)

然后写入文件。