Question

我在Twitter用户的df上使用drop_duplicates。

我想删除重复项。我有2行代表完全相同的用户。但是Python并没有像描述字段中的NaN值那样认识它。

换句话说，测试W.iloc[10,8]==W.iloc[11,8]返回False（第8列是描述，第11行是重复的行）。与np.isnan(W.iloc[10,8])同时，np.isnan(W.iloc[11,8])返回True。

因此，函数drop_duplictaes对这两行不起作用。

知道发生了什么事吗？

这里有2行

感谢您的帮助

莫罗

                        id           created_at lang   screen_name  name  \
226080  710412633443332096  2016-03-17 10:29:05   en  Mich00299495  Mich   
226081  710412633443332096  2016-03-17 10:29:05   en  Mich00299495  Mich   

                location default_profile default_profile_image description  \
226080  Grenoble, France            True                  True         NaN   
226081  Grenoble, France            True                  True         NaN   

        followers_count  ...    geo_enabled  \
226080              2.0  ...          False   
226081              2.0  ...          False   

                                  profile_image_url_https protected time_zone  \
226080  https://abs.twimg.com/sticky/default_profile_i...     False       NaN   
226081  https://abs.twimg.com/sticky/default_profile_i...     False       NaN   

       verified favourites_count  statuses_count   sex name_F name_M  
226080    False              0.0             0.0  none   none   none  
226081    False              0.0             0.0  none   none   none

与Nan一起删除重复项

0 个答案: