Pandas drop_duplicates - TypeError:*之后的类型对象参数必须是序列,而不是映射

时间:2016-06-13 14:56:10

标签: python pandas dataframe

我已经更新了我的问题,以提供一个更清晰的例子。

是否可以在Pandas中使用drop_duplicates方法根据值包含列表的列id删除重复行。考虑列'三'它由列表中的两个项组成。有没有办法删除重复的行而不是迭代地执行它(这是我当前的解决方法)。

我通过提供以下示例概述了我的问题:

import pandas as pd

data = [
{'one': 50, 'two': '5:00', 'three': 'february'}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 90, 'two': '9:00', 'three': 'january'}
]

df = pd.DataFrame(data)

print(df)

   one                three   two
0   50             february  5:00
1   25  [february, january]  6:00
2   25  [february, january]  6:00
3   25  [february, january]  6:00
4   90              january  9:00

df.drop_duplicates(['three'])

导致以下错误:

TypeError: type object argument after * must be a sequence, not map

1 个答案:

答案 0 :(得分:22)

我认为这是因为列表类型不可清除,而且正在弄乱重复的逻辑。作为一种解决方法,你可以像这样投射到元组:

df['four'] = df['three'].apply(lambda x : tuple(x) if type(x) is list else x)
df.drop_duplicates('four')

   one                three   two                 four
0   50             february  5:00             february
1   25  [february, january]  6:00  (february, january)
4   90              january  9:00              january