如何将文件的所有重复行复制到Python中的新文件?

时间:2018-06-10 05:57:37

标签: python duplicates lines

我正在尝试编写代码以将文件的所有副本复制到新文件中。我编写的程序检查每行的前3个元素,并将它与下一行进行比较。

f=open(r'C:\Users\xamer\Desktop\file.txt','r')
data=f.readlines()
f.close()
lines=data.copy()
dup=open(r'C:\Users\xamer\Desktop\duplicate.txt','a')
for x in data:
    for y in data:
        if (y[0]==x[0]) and (y[1]==x[1]) and (y[2]==x[2]):
            lines.append(y)
        else:
            lines.remove(y)
dup.write(lines)
dup.close()

我收到以下错误:

Traceback (most recent call last):
  File "C:\Users\xamer\Desktop\file.py", line 80, in <module>
    lines.remove(y)
ValueError: list.remove(x): x not in list

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

这些片段应该完成您要求的工作。一开始我想创建一个duplicated_lines列表,然后在结尾写下所有内容。但后来我意识到我可以通过动态编写重复的项目来优化代码性能,避免额外的最终循环

如另一位用户所强调的那样,您是否只想从位置独立检查相邻的双项或重复项目并不是很清楚

在第一种情况下 - 紧接着重复 - 这是代码:

# opening the source file
with open('hello.txt','r') as f:
    # returns a list containing the original lines
    data=f.readlines()

# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:

    for i in range(0, len(data)-1):
        # stripping to avoid a bug if the last line is a repeated item
        if(data[i].strip('\n') == data[i+1].strip('\n')):
            print("Lines {}: {}".format(i, data[i]))
            print("Lines {}: {}".format(i+1, data[i+1]))
            #duplicated_lines.append(data[i])
            print("Line repeated: " + data[i])
            f.write("%s\n" % data[i])

如果您想要检查文件中的重复行,那么这就是代码:

# opening the source file
with open('hello.txt','r') as f:
    # returns a list containing the original lines
    data=f.readlines()

# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:    
    for i in range(0, len(data)-1):
        for j in range(i+1, len(data)):
            # stripping to avoid a bug if the last line is a repeated item
            if(data[i].strip('\n') == data[j].strip('\n')):
                print("Lines {}: {}".format(i, data[i]))
                print("Lines {}: {}".format(j, data[j]))
                #duplicated_lines.append(data[i])
                print("Line repeated: " + data[i])
                f.write("%s\n" % data[i])