熊猫丢弃重复数据而忽略了NaN

时间:2018-07-06 21:23:06

标签: python pandas dataframe duplicates

在Pandas df中,我试图在多列中删除重复项。每行很多数据是NaN

这只是一个例子,数据是混合包,所以存在许多不同的组合。

df.drop_duplicates()

    IDnum       name            formNumber
1   NaN         AP GROUP        028-11964
2   1364615.0   AP GROUP        NaN
3   NaN         AP GROUP        NaN

希望的输出:

    IDnum       name            formNumber
1   1364615.0   AP GROUP        028-11964

编辑:

如果df.drop_duplicates()看起来像这样,会改变解决方案吗? :

df.drop_duplicates()

    IDnum       name            formNumber
0   NaN         AP GROUP        028-11964
1   1364615.0   AP GROUP        028-11964
2   1364615.0   AP GROUP        NaN
3   NaN         AP GROUP        NaN

2 个答案:

答案 0 :(得分:2)

您可以使用<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>Document</title> </head> <body> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <h1 class="abc">Hello</h1> <script> var eles=document.getElementsByClassName('abc'); var tmp = eles.prototype.length; Object.defineProperty(eles, 'length', { get: function(e) { console.log("je suis la", this); return tmp.call(this); } }) var len = eles.length; // for(i=0; i < eles.length;i++){ // } </script> </body> </html> + groupby

first

答案 1 :(得分:1)

您需要:

df.bfill().ffill().drop_duplicates()

输出:

IDnum   name    formNumber
0   1364615.0   AP GROUP    028-11964