如何移动pandas dataframe列的单个值

时间:2016-07-22 12:11:22

标签: python pandas dataframe

使用pandas first_valid_index()获取列的第一个非空值的索引,我如何移动列的单个值而不是整列。即。

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
        'columnA': [10, 21, 20, 10, 39, 30, 31,45, 23, 56],
        'columnB': [None, None, None, 10, 39, 30, 31,45, 23, 56],
         'total': [100, 200, 300, 400, 500, 600, 700,800, 900, 1000]}

df = pd.DataFrame(data)
df = df.set_index('year')
print df
      columnA  columnB  total
year                         
2010       10      NaN    100
2011       21      NaN    200
2012       20      NaN    300
2013       10       10    400
2014       39       39    500
2015       30       30    600
2016       31       31    700
2017       45       45    800
2018       23       23    900
2019       56       56   1000

for col in df.columns:
    if col not in ['total']:
        idx = df[col].first_valid_index()
        df.loc[idx, col] = df.loc[idx, col] + df.loc[idx, 'total'].shift(1)

print df     

AttributeError: 'numpy.float64' object has no attribute 'shift'

期望的结果:

print df
      columnA  columnB  total
year                         
2010       10      NaN    100
2011       21      NaN    200
2012       20      NaN    300
2013       10      310    400
2014       39       39    500
2015       30       30    600
2016       31       31    700
2017       45       45    800
2018       23       23    900
2019       56       56   1000

2 个答案:

答案 0 :(得分:2)

是你想要的吗?

In [63]: idx = df.columnB.first_valid_index()

In [64]: df.loc[idx, 'columnB'] += df.total.shift().loc[idx]

In [65]: df
Out[65]:
      columnA  columnB  total
year
2010       10      NaN    100
2011       21      NaN    200
2012       20      NaN    300
2013       10    310.0    400
2014       39     39.0    500
2015       30     30.0    600
2016       31     31.0    700
2017       45     45.0    800
2018       23     23.0    900
2019       56     56.0   1000

更新:从Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers开始。

答案 1 :(得分:1)

您可以过滤所有列名称,其中至少有一个NaN值,然后将union与列total一起使用:

for col in df.columns:
    if col not in pd.Index(['total']).union(df.columns[~df.isnull().any()]):
        idx = df[col].first_valid_index()
        df.loc[idx, col] += df.total.shift().loc[idx]
print (df)
      columnA  columnB  total
year                         
2010       10      NaN    100
2011       21      NaN    200
2012       20      NaN    300
2013       10    310.0    400
2014       39     39.0    500
2015       30     30.0    600
2016       31     31.0    700
2017       45     45.0    800
2018       23     23.0    900
2019       56     56.0   1000