Question

我想排除那些具有相同标题和同一年的实例。

     title      votes  ranking  year
0    Wonderland  19      7.9    1931
1    Wonderland  120     7.1    1997
2    Wonderland  3524    7.2    1999
3    Wonderland  18169   6.6    2003
4    Wonderland  17      8.7    2010
5    Wonderland  6       8.5    2012
6    Wonderland  8       7.4    2012

例如，在这种情况下。我只会删除5或6

Answer 1

您可以将drop_duplicates()与subset=参数一起使用。如果您的数据框名为df，则执行以下操作：

In [13]: df.drop_duplicates(subset=['title', 'year'])

将返回：

Out[13]:
        title  votes  ranking  year
0  Wonderland     19      7.9  1931
1  Wonderland    120      7.1  1997
2  Wonderland   3524      7.2  1999
3  Wonderland  18169      6.6  2003
4  Wonderland     17      8.7  2010
5  Wonderland      6      8.5  2012

请注意，您丢失了索引6中包含的投票和排名的唯一信息。

考虑到Pandas中的几个属性，删除重复项

1 个答案: