Question

此代码以前在python 3中起作用，以删除重复的值，但在整个数据帧中保持首次出现。回到我的脚本后，这不再删除pandas dataFrame中的重复项。

db.movieDetails.aggregate(
  {$unwind: '$actors'},
  {$match: { actors: { $regex: "^C.C", $options: 'i' } } },
  {$group: {_id: { actors: '$actors' , title: "$title" }}}
);

如果我有

df = df.apply(lambda x: x.drop_duplicates(), axis=1)

我想作为输出

我不介意空格是否返回'nan'

我也尝试了以下

和

df.drop_duplicates(subset = None, keep='first')

任何建议/替代方案都将受到欢迎！

Answer 1

附加数据后，我认为您可以使用duplicated

newdf=df[~df.stack().duplicated().unstack()]
newdf
Out[131]: 
      a    b     c
0   0.0  1.0   2.0
1   3.0  4.0   NaN
2   NaN  8.0   9.0
3  10.0  NaN  11.0

Answer 2

您需要inplace为真：

df.drop_duplicates(subset=None, keep='first', inplace=True)

drop_duplicates（）在Python熊猫中停止工作

2 个答案: