Question

假设您有一个数据框，其中第一列是日期，连续列是随时间调整的值。 F.X.随着新信息的出现，特定日期的风的预测会随着时间的推移而变化。

我的任务是计算第一个列的差异。所以原理类似于pandas.DataFrame.diff，但引用值不是前一列，而是第一列。

假设您的数据框看起来像这样

Date    Forecast1    Forecast2    Forecast3        
1/1/15    5             3              7

我希望结果看起来像这样：

Date    Forecast1    Forecast2    Forecast3        
1/1/15    NaN             -2          2

我希望我的解释清楚。

感谢您的努力。

Answer 1

只需使用pd.DataFrame.sub：

In [108]: df=pd.DataFrame(np.random.randint(0,6,(3,3)), 
columns=['Forecast'+str(i) for i in range(1,4)],
index=pd.date_range('2016/1/1',periods=3))

In [109]: df
Out[109]: 
            Forecast1  Forecast2  Forecast3
2016-01-01          5          5          5
2016-01-02          0          3          0
2016-01-03          2          4          2

In [110]: df.sub(df.Forecast1,axis=0)
Out[110]: 
            Forecast1  Forecast2  Forecast3
2016-01-01          0          0          0
2016-01-02          0          3          0
2016-01-03          0          2          0

Answer 2

你可以使用apply(..., axis=1)将它应用于行（axis = 1）而不是列（默认值：axis = 0）：

In [78]: df
Out[78]:
     Date  Forecast1  Forecast2  Forecast3
0  1/1/15          5          3          7
1  2/3/15          1          4          5
2  3/4/15         10          2          1

In [79]: cols = [c for c in df.columns.tolist() if 'Forecast' in c]

In [80]: cols
Out[80]: ['Forecast1', 'Forecast2', 'Forecast3']

In [81]: df[cols].apply(lambda x: x-x[0], axis=1)
Out[81]:
   Forecast1  Forecast2  Forecast3
0          0         -2          2
1          0          3          4
2          0         -8         -9

使用Panda的diff（）对应data.frame的第一列/行

2 个答案: