使用pandas first_valid_index()
获取列的第一个非空值的索引,我如何移动列的单个值而不是整列。即。
data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
'columnA': [10, 21, 20, 10, 39, 30, 31,45, 23, 56],
'columnB': [None, None, None, 10, 39, 30, 31,45, 23, 56],
'total': [100, 200, 300, 400, 500, 600, 700,800, 900, 1000]}
df = pd.DataFrame(data)
df = df.set_index('year')
print df
columnA columnB total
year
2010 10 NaN 100
2011 21 NaN 200
2012 20 NaN 300
2013 10 10 400
2014 39 39 500
2015 30 30 600
2016 31 31 700
2017 45 45 800
2018 23 23 900
2019 56 56 1000
for col in df.columns:
if col not in ['total']:
idx = df[col].first_valid_index()
df.loc[idx, col] = df.loc[idx, col] + df.loc[idx, 'total'].shift(1)
print df
AttributeError: 'numpy.float64' object has no attribute 'shift'
期望的结果:
print df
columnA columnB total
year
2010 10 NaN 100
2011 21 NaN 200
2012 20 NaN 300
2013 10 310 400
2014 39 39 500
2015 30 30 600
2016 31 31 700
2017 45 45 800
2018 23 23 900
2019 56 56 1000
答案 0 :(得分:2)
是你想要的吗?
In [63]: idx = df.columnB.first_valid_index()
In [64]: df.loc[idx, 'columnB'] += df.total.shift().loc[idx]
In [65]: df
Out[65]:
columnA columnB total
year
2010 10 NaN 100
2011 21 NaN 200
2012 20 NaN 300
2013 10 310.0 400
2014 39 39.0 500
2015 30 30.0 600
2016 31 31.0 700
2017 45 45.0 800
2018 23 23.0 900
2019 56 56.0 1000
更新:从Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers开始。
答案 1 :(得分:1)
您可以过滤所有列名称,其中至少有一个NaN
值,然后将union
与列total
一起使用:
for col in df.columns:
if col not in pd.Index(['total']).union(df.columns[~df.isnull().any()]):
idx = df[col].first_valid_index()
df.loc[idx, col] += df.total.shift().loc[idx]
print (df)
columnA columnB total
year
2010 10 NaN 100
2011 21 NaN 200
2012 20 NaN 300
2013 10 310.0 400
2014 39 39.0 500
2015 30 30.0 600
2016 31 31.0 700
2017 45 45.0 800
2018 23 23.0 900
2019 56 56.0 1000