我有一个类似CSV的
col-1(ID) col-2(val-List)
1 [1]
1 [1,2,3]
2 [1,2]
2 [1]
3 [10]
3 [10]
我想从该文件中删除重复项,最后,我只需要一行具有更长列表长度的行,例如:
已编辑:
如果那些行的ID和内部列表的长度相同,我想保留一行。
col-1(ID) col-2(Val-List)
1 [1,2,3]
2 [1,2]
3 [10]
我尝试了很多但没有运气: 我正在尝试使用CSV模块,但不知道如何保持上一个 Val-List 的长度并与下一个匹配ID进行比较。
import csv
list_1 = []
with open('test123.csv', 'r', encoding='latin-1') as file:
csvReader = csv.reader(file, delimiter=',')
for row in csvReader:
key = (row[0])
# but how should I use this id to get my desired results?
答案 0 :(得分:3)
为什么不让pandas
做这项工作?
import pandas
# Read in the CSV
df = pandas.read_csv('test123.csv', encoding='latin-1')
# Compute the list lengths
df['lst_len'] = df['col-2(val-List)'].map(lambda x: len(list(x)))
# Sort in reverse order by list lengths
df = df.sort_values('lst_len', ascending=False)
# Drop duplicates, preserving first (longest) list by ID
df = df.drop_duplicates(subset='col-1(ID)')
# Remove extra column that we introduced, write to file
df = df.drop('lst_len', axis=1)
df.to_csv('clean_test123.csv', index=False)