如何使用python基于重复删除csv文件中的特定行?

时间:2017-08-02 09:20:45

标签: python csv

我有一个csv文件,其中有许多行如下所示。

20170718 014418.475476 [UE:142 CRNTI : 446] 

20170718 094937.865362 [UE:142 CRNTI : 546] 

以上是csv文件的两行示例。

现在,如果我们看到行,则会出现一个名为[UE:142 ...]的字符串,该字符串在csv文件中重复。

问题陈述:

我想删除包含字符串的重复行[UE:< >不止一次在那个csv文件中,即在上面的行中,字符串[UE:142重复两次,所以第二个必须被删除,这样有很多随机字符串,如[UE:142。

有人可以帮我解决上述问题陈述的python脚本吗?

import csv
reader = open("test.csv", "r")
lines = reader.read().split(" ")
reader.close()

writer = open("test_1.csv", "w")
for line in set(lines):
    writer.write(line)
writer.close()

1 个答案:

答案 0 :(得分:0)

from csv import reader, writer as csv_writer

csv_path = '<your csv file path here>'

def remove_duplicate_ue (csv_path):
    found = False
    with open (csv_path, 'r') as csv_file:
        for line in reader (csv_file, delimiter = ' '):
            if 'UE:' not in line [-1]:
                yield line
            elif not found:
                yield line
                found = True

def write_csv (csv_path, rows, delimiter = ' '):
    with open (csv_path, 'w') as csv_file:
        writer = csv_writer (csv_file, delimiter = delimiter)
        for row in rows:
            writer.writerow (row)

write_csv (csv_path, tuple (remove_duplicate_ue (csv_path)))