我有一个如下数据框:
import pandas as pd
df = pd.DataFrame({'Country':['USA','USA','MEX','IND','UK','UK','UK'],
'Region':['Americas','NaN','NaN','Asia','Europe','NaN','NaN'],
'Flower':['Rose','Rose','Lily','Orchid','Dandelion','Dandelion','Dandelion'],
'Animal':['Bison','NaN','Golden Eagle','Tiger','Lion','Lion','NaN'],
'Game':['Baseball','Baseball','soccer','hockey','cricket','cricket','cricket']})
我想按Country
和Flower
分组,并向前或向后填充缺少值的列Region
和Animal
。但是,Game
列应保持不变
我已经尝试过了,但是没有用:
df['Region'] = df.groupby(['Country','Flower'])['Region'].transform(lambda x: x.ffill())
还:
df.groupby(['Country','Flower'])['Animal', 'Region'].isna().bfill()
我想知道该怎么做。
虽然可行,但会删除“游戏”列:
df=df.replace({'NaN':np.nan})
df.groupby(['Country','Flower'])['Animal', 'Region'].bfill().ffill()
如果我进行转换,则长度不匹配。还请注意,这是示例数据帧,其中我在原始帧中将nN.nan添加为字符串“ NaN”。
答案 0 :(得分:0)
首先,您需要知道'NaN'
不是NaN
df=df.replace({'NaN':np.nan})
df.groupby(['Country','Flower'])['Region'].ffill()
Out[109]:
0 Americas
1 Americas
2 NaN# since here only have single row , that why stay NaN
3 Asia
4 Europe
5 Europe
6 Europe
Name: Region, dtype: object
第二,如果需要在pandas
中链接两个iid函数,则需要apply
df.update(df.groupby(['Country','Flower'])['Animal', 'Region'].apply(lambda x : x.bfill().ffill()))
df
Out[119]:
Animal Country Flower Game Region
0 Bison USA Rose Baseball Americas
1 Bison USA Rose Baseball Americas
2 Golden Eagle MEX Lily soccer NaN
3 Tiger IND Orchid hockey Asia
4 Lion UK Dandelion cricket Europe
5 Lion UK Dandelion cricket Europe
6 Lion UK Dandelion cricket Europe
答案 1 :(得分:0)
如果您将数据框代码更改为实际包含np.nan
,则您提供的代码将真正起作用。尽管nans以普通文本“ Nan”的形式出现,但是您无法创建一个用手写文本的数据框,因为它将被解释为字符串,而不是实际的缺失值。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Country':['USA','USA','MEX','IND','UK','UK','UK'],
'Region':['Americas',np.nan,np.nan,'Asia','Europe',np.nan,np.nan],
'Flower':['Rose','Rose','Lily','Orchid','Dandelion','Dandelion','Dandelion'],
'Animal':['Bison',np.nan,'Golden Eagle','Tiger','Lion','Lion','NaN'],
'Game':['Baseball','Baseball','soccer','hockey','cricket','cricket','cricket']})
然后,这个:
df['Region'] = df.groupby(['Country','Flower'])['Region'].transform(lambda x: x.ffill())
实际上产生了这个:
Animal Country Flower Game Region
0 Bison USA Rose Baseball Americas
1 NaN USA Rose Baseball Americas
2 Golden Eagle MEX Lily soccer NaN
3 Tiger IND Orchid hockey Asia
4 Lion UK Dandelion cricket Europe
5 Lion UK Dandelion cricket Europe
6 NaN UK Dandelion cricket Europe