给定具有相似列的数据帧,其间具有空值。如何动态填充来自其他列的非空值的列中的空值,而不显式声明其他列名称的名称,例如选择第一列category1
并使用来自相同行的其他列的值填充空行?
data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56],
'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56],
'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],}
df = pd.DataFrame(data)
df = df.set_index('year')
df
category1 category2 category3
year
2010 NaN 10 10
2011 21 21 21
2012 NaN 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
填写category1
后:
category1 category2 category3
year
2010 10 10 10
2011 21 21 21
2012 20 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
答案 0 :(得分:1)
IIUC你可以这样做:
In [369]: df['category1'] = df['category1'].fillna(df['category2'])
In [370]: df
Out[370]:
category1 category2 category3
year
2010 10.0 10.0 10.0
2011 21.0 21.0 21.0
2012 20.0 20.0 20.0
2013 10.0 10.0 10.0
2014 NaN NaN NaN
2015 30.0 30.0 30.0
2016 31.0 NaN 31.0
2017 45.0 45.0 45.0
2018 23.0 23.0 23.0
2019 56.0 56.0 56.0
答案 1 :(得分:0)
如果所有值均为NaN
,则可以first_valid_index
使用条件:
def f(x):
if x.first_valid_index() is None:
return None
else:
return x[x.first_valid_index()]
df['a'] = df.apply(f, axis=1)
print (df)
category1 category2 category3 a
year
2010 NaN 10.0 10.0 10.0
2011 21.0 21.0 21.0 21.0
2012 NaN 20.0 20.0 20.0
2013 10.0 10.0 10.0 10.0
2014 NaN NaN NaN NaN
2015 30.0 30.0 30.0 30.0
2016 31.0 NaN 31.0 31.0
2017 45.0 45.0 45.0 45.0
2018 23.0 23.0 23.0 23.0
2019 56.0 56.0 56.0 56.0
答案 2 :(得分:0)
试试这个:
df['category1']= df['category1'].fillna(df.median(axis=1))