计算多个熊猫数据框的变化百分比

时间:2019-08-12 13:45:44

标签: python pandas dataframe

假设我有两个不同的熊猫数据框,它们的结构完全相同:

df1

+---+---------+------+------+------+
|   | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count   | 10   | 10   | 10   |
+---+---------+------+------+------+
| 1 | mean    | 4    | 5    | 5    |
+---+---------+------+------+------+
| 2 | stddev  | 3    | 3    | 3    |
+---+---------+------+------+------+
| 3 | min     | 0    | -1   | 5    |
+---+---------+------+------+------+
| 4 | max     | 100  | 56   | 47   |
+---+---------+------+------+------+

df2

+---+---------+------+------+------+
|   | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count   | 15   | 15   | 5    |
+---+---------+------+------+------+
| 1 | mean    | 2    | 2.5  | 2.5  |
+---+---------+------+------+------+
| 2 | stddev  | 3    | 3    | 3    |
+---+---------+------+------+------+
| 3 | min     | 0    | -1   | 5    |
+---+---------+------+------+------+
| 4 | max     | 50   | 56   | 47   |
+---+---------+------+------+------+

对于每个条目,我想计算两个数据框的值之间的百分比变化。我知道有一个函数pct_change(),但这仅适用于相同的熊猫数据框。 所需的输出是

+---+---------+------+------+------+
|   | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count   | 50%  | 50%  | -50% |
+---+---------+------+------+------+
| 1 | mean    | -50% | -50% | -50% |
+---+---------+------+------+------+
| 2 | stddev  | 0%   | 0%   | 0%   |
+---+---------+------+------+------+
| 3 | min     | 0%   | 0%   | 0%   |
+---+---------+------+------+------+
| 4 | max     | -50% | 0%   | 0%   |
+---+---------+------+------+------+

3 个答案:

答案 0 :(得分:1)

按字符串列创建索引,将DataFrames除以DataFrame.div,将1除以DataFrame.sub,再乘以DataFrame.mul

df = df2.set_index('summary').div(df1.set_index('summary')).sub(1).mul(100).reset_index()
print (df)
  summary  col1  col2  col3
0   count  50.0  50.0 -50.0
1    mean -50.0 -50.0 -50.0
2  stddev   0.0   0.0   0.0
3     min   NaN   0.0   0.0
4     max -50.0   0.0   0.0

编辑:

如果列表中的数据框之间需要pct_change,则df1与df2,df2与df3 ...:

L = [df1, df2]
df = (pd.concat(L, keys=range(len(L)))
        .set_index('summary', append=True)
        .groupby(level=1)
        .pct_change())

print (df)
             col1  col2  col3
    summary                  
0 0 count     NaN   NaN   NaN
  1 mean      NaN   NaN   NaN
  2 stddev    NaN   NaN   NaN
  3 min       NaN   NaN   NaN
  4 max       NaN   NaN   NaN
1 0 count     0.5   0.5  -0.5
  1 mean     -0.5  -0.5  -0.5
  2 stddev    0.0   0.0   0.0
  3 min       NaN   0.0   0.0
  4 max      -0.5   0.0   0.0

答案 1 :(得分:0)

您可以将2个datframe合并为一个

df = pd.concat([df1,df2])

,然后使用pct_change()函数并将periods参数设置为DataFrames中的列数。

答案 2 :(得分:0)

为什么不简单

((df2-df1)/df1).style.format('{:.0%}')