我有一个看起来像这样的CSV文件
red,75,right
green,3,center
yellow,3222,right
blue,9,center
black,123,left
white,68,right
purple,988,right
pink,2677,left
我正在使用Python并且正在尝试删除在单元格1中有重复的行。我知道我可以使用像pandas这样的东西来实现这一点,但我正在尝试使用标准的python CSV库。
预期结果是......
{{1}}
有人有例子吗?
答案 0 :(得分:1)
您可以简单地使用颜色为键的字典,值为行。如果颜色已经在字典中,则忽略该颜色,否则添加它并将该行写入新的csv文件。
import csv
file_in = 'input_file.csv'
file_out = 'output_file.csv'
with open(file_in, 'rb') as fin, open(file_out, 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
d = {}
for row in reader:
color = row[0]
if color not in d:
d[color] = row
writer.writerow(row)
result = d.values()
result
# Output:
# [['blue', '9', 'center'],
# ['pink', '2677', 'left'],
# ['purple', '48', 'left'],
# ['yellow', '3222', 'right'],
# ['black', '123', 'left'],
# ['green', '3', 'center'],
# ['white', '68', 'right'],
# ['red', '75', 'right']]
csv文件的输出:
!cat output_file.csv
# Output:
# red,75,right
# green,3,center
# yellow,3222,right
# blue,9,center
# black,123,left
# white,68,right
# purple,48,left
# pink,2677,left
答案 1 :(得分:0)
你可以试试这个:
import fileinput
def main():
seen = set() # set for fast O(1) amortized lookup
for line in fileinput.FileInput('1.csv', inplace=1):
cell_1 = line.split(',')[0]
if cell_1 not in seen:
seen.add(cell_1)
print line, # standard output is now redirected to the file
if __name__ == '__main__':
main()