我有一个csv文件,其中有许多行如下所示。
20170718 014418.475476 [UE:142 CRNTI : 446]
20170718 094937.865362 [UE:142 CRNTI : 546]
以上是csv文件的两行示例。
现在,如果我们看到行,则会出现一个名为[UE:142 ...]的字符串,该字符串在csv文件中重复。
问题陈述:
我想删除包含字符串的重复行[UE:< >不止一次在那个csv文件中,即在上面的行中,字符串[UE:142重复两次,所以第二个必须被删除,这样有很多随机字符串,如[UE:142。
有人可以帮我解决上述问题陈述的python脚本吗?
import csv
reader = open("test.csv", "r")
lines = reader.read().split(" ")
reader.close()
writer = open("test_1.csv", "w")
for line in set(lines):
writer.write(line)
writer.close()
答案 0 :(得分:0)
from csv import reader, writer as csv_writer
csv_path = '<your csv file path here>'
def remove_duplicate_ue (csv_path):
found = False
with open (csv_path, 'r') as csv_file:
for line in reader (csv_file, delimiter = ' '):
if 'UE:' not in line [-1]:
yield line
elif not found:
yield line
found = True
def write_csv (csv_path, rows, delimiter = ' '):
with open (csv_path, 'w') as csv_file:
writer = csv_writer (csv_file, delimiter = delimiter)
for row in rows:
writer.writerow (row)
write_csv (csv_path, tuple (remove_duplicate_ue (csv_path)))