Question

我正在尝试读取CSV文件并将其中的行写入另一个csv文件。我的输入文件有重复的行。在输出中我只想要单行。从我的示例脚本中，您可以看到我创建了一个名为readers的列表。此列表包含输入csv的所有行。然后在for循环中我使用writer.writerow（readers [1] + ....），这基本上是读取标题后面的第一行。但问题是第一行是重复的。如何调整我的脚本以便它只执行一次？

for path in glob.glob("out.csv"):
    if path == "out1.csv": continue
    with open(path) as fh:
        readers = list(csv.reader(fh))

        for row in readers:

            if row[8] == 'READ' and row[10] == '1110':

                writer.writerow(readers[1] + [] + [row[2]])
            elif row[8] == 'READ' and row[10] == '1011':
                writer.writerow(readers[1] + [] + [" "] + [" "] + [" "] + [row[2]])
            elif row[8] == 'READ' and row[10] != ('1101', '0111'):
                writer.writerow(readers[1] + [] + [" "] + [row[2]])

示例输入

    ID No.  Name    Value   RESULTS
      28    Jason   56789   Fail
      28    Jason   56789   Fail
      28    Jason   56789   Fail
      28    Jason   56789   Fail

Answer 1

您可以使用pandas包。这将是这样的：

import pandas as pd
# Read the file (considering header by default) and save in variable:
table = pd.read_csv()
# Drop the duplicates:
clean_table = table.drop_duplicates()
# Save clean data:
clean_table.to_csv("data_without_duplicates.csv")

您可以查看参考文献here和here

Answer 2

您可以使用set类型删除重复项

readers_unique = list(set(readers))

Answer 3

虽然上面的答案基本上是正确的，但使用Pandas对我来说似乎有些过分。只需使用一个列表，其中包含您在处理过程中已经看到的ID列值（假设ID列获取其名称，否则您必须使用组合键）。然后检查您是否已经看到了这个值并且＆＃34; presto＆＃34;：

ID_COL = 1
id_seen = []
for path in glob.glob("out.csv"):
    if path == "out1.csv": continue
    with open(path) as fh:
        for row in csv.reader(fh):
            if row[ID_COL] not in id_seen:
                id_seen.append(row[ID_COL])
                # write out whatever column you have to
                writer.writerow(readers[1] + [] + [row[2]])

忽略CSV上的重复行

3 个答案: