我试图根据列上的条件编写一个只保留重复行的python脚本。例如,我的input
csv文件如下所示:
Name, Apt_Number, Block_Number, .... , Other_Columns
John, apt1, ABC, .............., dummyVal
Marie, apt2, ABC, .............., dummyVal
John, apt3, XYZ, .............., dummyVal
Sam, apt4, ABC, .............., dummyVal
Sam, apt5, LMO, .............., dummyVal
我希望我的output
csv文件看起来像这样:
Name, Apt_Number, Block_Number, .... , Other_Columns
John, apt1, ABC, .............., dummyVal
John, apt3, XYZ, .............., dummyVal
Sam, apt4, ABC, .............., dummyVal
Sam, apt5, LMO, .............., dummyVal
即。我希望保持行数相同name
更多,然后block_Number
不同。有人可以建议我如何使用python实现这个?我可以探索哪种数据结构?
答案 0 :(得分:0)
我认为我的答案非常优雅,但您可以在从csv文件中读取数据后使用for循环。使用pop()从数据集中删除一行,因此只扫描一次。
import csv
data = []
with open('input.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
duplicate_names = []
while data:
row = data.pop()
dup_exists = False
for other_row in data:
if row['name'] == other_row['name']:
duplicate_names.append(other_row)
data.remove(other_row)
dup_exists = True
if dup_exists:
duplicate_names.append(row)
with open('output.csv', 'w') as csvfile:
fieldnames = []
for key in duplicate_names[0]:
fieldnames.append(key)
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in duplicate_names:
writer.writerow(row)
其他人可能知道一些更好的Python掌握,并且有更短的方法来做到这一点。