Question

我已经更新了我的问题，以提供一个更清晰的例子。

是否可以在Pandas中使用drop_duplicates方法根据值包含列表的列id删除重复行。考虑列＆＃39;三＆＃39;它由列表中的两个项组成。有没有办法删除重复的行而不是迭代地执行它（这是我当前的解决方法）。

我通过提供以下示例概述了我的问题：

import pandas as pd

data = [
{'one': 50, 'two': '5:00', 'three': 'february'}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 90, 'two': '9:00', 'three': 'january'}
]

df = pd.DataFrame(data)

print(df)

   one                three   two
0   50             february  5:00
1   25  [february, january]  6:00
2   25  [february, january]  6:00
3   25  [february, january]  6:00
4   90              january  9:00

df.drop_duplicates(['three'])

导致以下错误：

TypeError: type object argument after * must be a sequence, not map

Answer 1

我认为这是因为列表类型不可清除，而且正在弄乱重复的逻辑。作为一种解决方法，你可以像这样投射到元组：

df['four'] = df['three'].apply(lambda x : tuple(x) if type(x) is list else x)
df.drop_duplicates('four')

   one                three   two                 four
0   50             february  5:00             february
1   25  [february, january]  6:00  (february, january)
4   90              january  9:00              january

Pandas drop_duplicates - TypeError：*之后的类型对象参数必须是序列，而不是映射

1 个答案: