删除具有要合并在一起的不同属性的重复项

时间:2012-08-28 15:14:16

标签: python csv

我有以下csv文件:

name, sector, year, region, number

bob,,1999,AS,2

bob,hi-tech,,,3

mike,,2001,NE,2

plan,pharma,,,1

我编写了一个脚本,该脚本查找每个实例,其中“name”对于行和它下面的行是相同的(csv文件已经按“name”值排序)。我当前脚本的输出如下:

name, sector, year, region, number

bob,tennis,1999,AS,2+3

bob,tennis,,,3

mike,,2001,NE,2

plan, baseball,,,1

这几乎是我想要的。我当前脚本的好处是它识别“name”值相同的每个实例,然后将这两行的所有属性与该名称组合,并更新“number”列。我的脚本的问题是,一旦创建了新行,就应该删除进入合并的两行。在上面的例子中,第二行:

bob,tennis,,,3

不应该在这里。我已经复制了下面我的实际脚本的相关部分,非常感谢任何人都可以提供的任何澄清。

for next_row in reader:
        first_name = first_row['name']
        next_name = next_row['name']

        if first_name == next_name:
            if first_row['source'] == '2':
                #get relevant attributes from next_row and add them to first_row

                first_row['number'] = first_row['number'] + ' + ' + next_row['number']
            elif next_row['number'] == '2':
                #get relevant attributes from next_row and add them to first_row

                first_row['number'] = first_row['number'] + ' + ' + next_row['number']
            writer.writerow(first_row)
            first_row = next_row
        else:
            writer.writerow(first_row)

            first_row = next_row

1 个答案:

答案 0 :(得分:1)

正如评论中所建议的那样,您可能希望使用reader的迭代器。如果reader采用next方法,那么您就可以了;否则,您可以使用reader=iter(reader)

首先,定义您的first_row:您只需first_row = reader.next()

然后,只需要尝试一个接一个的输入:只有当first_row不再等于next_row时,你才会写下你的行并更新StopIteration

迭代器完全耗尽后,会引发first_row。你只需要写下最后一个try: while True: next_row = reader.next() if first_row["name"] == next_row["name"]: ...do_something... else: writer.writerow(first_row) first_row = next_row except StopIteration: writer.writerow(first_row)

{{1}}