根据其他列的分组在熊猫列中向前填充或向后填充NaN值

时间:2019-07-12 00:33:54

标签: python pandas pandas-groupby

我有一个如下数据框:

import pandas as pd

df = pd.DataFrame({'Country':['USA','USA','MEX','IND','UK','UK','UK'],
                   'Region':['Americas','NaN','NaN','Asia','Europe','NaN','NaN'],
                   'Flower':['Rose','Rose','Lily','Orchid','Dandelion','Dandelion','Dandelion'],
                   'Animal':['Bison','NaN','Golden Eagle','Tiger','Lion','Lion','NaN'],
                   'Game':['Baseball','Baseball','soccer','hockey','cricket','cricket','cricket']})

我想按CountryFlower分组,并向前或向后填充缺少值的列RegionAnimal。但是,Game列应保持不变

我已经尝试过了,但是没有用:

df['Region'] = df.groupby(['Country','Flower'])['Region'].transform(lambda x: x.ffill())

还:

df.groupby(['Country','Flower'])['Animal', 'Region'].isna().bfill()

我想知道该怎么做。

虽然可行,但会删除“游戏”列:

df=df.replace({'NaN':np.nan}) df.groupby(['Country','Flower'])['Animal', 'Region'].bfill().ffill()

如果我进行转换,则长度不匹配。还请注意,这是示例数据帧,其中我在原始帧中将nN.nan添加为字符串“ NaN”。

2 个答案:

答案 0 :(得分:0)

首先,您需要知道'NaN'不是NaN

df=df.replace({'NaN':np.nan})
df.groupby(['Country','Flower'])['Region'].ffill()
Out[109]: 
0    Americas
1    Americas
2         NaN# since here only have single row , that why stay NaN
3        Asia
4      Europe
5      Europe
6      Europe
Name: Region, dtype: object

第二,如果需要在pandas中链接两个iid函数,则需要apply

df.update(df.groupby(['Country','Flower'])['Animal', 'Region'].apply(lambda x : x.bfill().ffill()))                               
df
Out[119]: 
         Animal Country     Flower      Game    Region
0         Bison     USA       Rose  Baseball  Americas
1         Bison     USA       Rose  Baseball  Americas
2  Golden Eagle     MEX       Lily    soccer       NaN
3         Tiger     IND     Orchid    hockey      Asia
4          Lion      UK  Dandelion   cricket    Europe
5          Lion      UK  Dandelion   cricket    Europe
6          Lion      UK  Dandelion   cricket    Europe

答案 1 :(得分:0)

如果您将数据框代码更改为实际包含np.nan,则您提供的代码将真正起作用。尽管nans以普通文本“ Nan”的形式出现,但是您无法创建一个用手写文本的数据框,因为它将被解释为字符串,而不是实际的缺失值。

import pandas as pd
import numpy as np

df = pd.DataFrame({'Country':['USA','USA','MEX','IND','UK','UK','UK'],
                   'Region':['Americas',np.nan,np.nan,'Asia','Europe',np.nan,np.nan],
                   'Flower':['Rose','Rose','Lily','Orchid','Dandelion','Dandelion','Dandelion'],
                   'Animal':['Bison',np.nan,'Golden Eagle','Tiger','Lion','Lion','NaN'],
                   'Game':['Baseball','Baseball','soccer','hockey','cricket','cricket','cricket']})

然后,这个:

df['Region'] = df.groupby(['Country','Flower'])['Region'].transform(lambda x: x.ffill())

实际上产生了这个:

         Animal Country     Flower      Game    Region
0         Bison     USA       Rose  Baseball  Americas
1           NaN     USA       Rose  Baseball  Americas
2  Golden Eagle     MEX       Lily    soccer       NaN
3         Tiger     IND     Orchid    hockey      Asia
4          Lion      UK  Dandelion   cricket    Europe
5          Lion      UK  Dandelion   cricket    Europe
6           NaN      UK  Dandelion   cricket    Europe