在特定列Python中选择具有唯一值的csv行

时间:2018-09-25 13:06:16

标签: python python-3.x csv

我有一个CSV文件,其中包含

等行
A,apple,102
A,orange,103
B,banana,101
C,peach,102
B,orange,104

以此类推...

我想删除第一列中具有重复值的行,上面的输出应为:

A,apple,102
B,banana,101
C,peach,102

3 个答案:

答案 0 :(得分:0)

您可以创建一个空集并将第一列的值添加到其中。如果已经在集合中,则跳到下一行,例如:

import csv

column_values = set()
new_rows = []

with open('example.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        if (row[0] in column_values):
            continue
        column_values.add(row[0])
        new_rows.append(row)

with open('updated.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(new_rows)

答案 1 :(得分:0)

itertools recipes中有一个unique_everseen的配方(此处略有改动)。可能在这里有点矫kill过正,但它可以起作用:

from io import StringIO
from csv import reader
from operator import itemgetter


def unique_everseen(iterable, key):
    "List unique elements, preserving order. Remember all elements ever seen."
    seen = set()
    seen_add = seen.add
    for element in iterable:
        k = key(element)
        if k not in seen:
            seen_add(k)
            yield element

txt = '''A,apple,102
A,orange,103
B,banana,101
C,peach,102
B,orange,104'''

with StringIO(txt) as file:
    rows = reader(file)
    unique_rows = unique_everseen(rows, key=itemgetter(0))
    for row in unique_rows:
        print(row)

我将operator.itemgetter(0)用作key,以便选择行中的第一列。

然后您可以使用rowcsv.writer写入新文件。

当然,您必须将StringIO(txt)替换为open('file.csv', 'r')

答案 2 :(得分:0)

如果您愿意使用第三方库,则可以使用熊猫:

Warning: Error in do.call: second argument must be a list
  131: stop
  130: do.call
  129: hot_to_r
  127: eventReactiveHandler [C:/Users/Mykhalo Petrovskyy/Desktop/Accessible Project/R_Econ_App/new.R#59]
   83: df_new
   79: func [C:/Users/Mykhalo Petrovskyy/Desktop/Accessible Project/R_Econ_App/new.R#63]
   78: origRenderFunc
   77: output$tbl
    1: runApp