减去两个数据帧以获得值差异

时间:2018-04-16 17:56:31

标签: python-3.x pandas

我有一个数据框,我在其中得到特定日期的每日汇总。下面是日期 cpu cpu cpu cpu mem mem mem mem load load load load drops drops drops drops latency latency latency latency gw_latency gw_latency gw_latency gw_latency upload upload upload upload download download download download sap_drops sap_drops sap_drops sap_drops sap_latency sap_latency sap_latency sap_latency mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std date 2018-02-11 4.282442748 0 17 4.361148065 13.61068702 0 27 6.123815451 3.891450382 0 47.62 6.426298507 1.526717557 0 100 12.30842628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 的数据框,我在那里找到了平均值,最小值,最大值,标准值

2018-02-12

同样,我有另一个日期 cpu cpu cpu cpu mem mem mem mem load load load load drops drops drops drops latency latency latency latency gw_latency gw_latency gw_latency gw_latency upload upload upload upload download download download download sap_drops sap_drops sap_drops sap_drops sap_latency sap_latency sap_latency sap_latency mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std mean min max std date 2018-02-12 5.726315789 0 21 2.938315053 22.30526316 0 23 3.581474037 6.06 0 44.75 6.798944285 0.5263157895 0 100 7.254762501 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 的数据框,我找到了它的平均值,最小值,最大值,标准值

import pandas as pd
df = pd.read_csv("metrics.csv", parse_dates=["date"])
df.set_index("date", inplace=True)
df_prev = df.loc['2018-02-11'].resample('D')['cpu', 'mem', 'load', 'drops', 'latency',
                                             'gw_latency', 'upload', 'download', 'sap_drops',
                                             'sap_latency'].agg(['mean', 'min', 'max', 'std']).fillna(0)

df_next = df.loc['2018-02-12'].resample('D')['cpu', 'mem', 'load', 'drops', 'latency',
                                             'gw_latency', 'upload', 'download', 'sap_drops',
                                             'sap_latency'].agg(['mean', 'min', 'max', 'std']).fillna(0)

以下是代码

df_diff = df_next.sub(df_prev, fill_value=0)
print(df_diff)

现在我想减去两个数据帧以获得每个列的值差异。这就是我做的事情

    cpu cpu cpu cpu mem mem mem mem load    load    load    load    drops   drops   drops   drops   latency latency latency latency gw_latency  gw_latency  gw_latency  gw_latency  upload  upload  upload  upload  download    download    download    download    sap_drops   sap_drops   sap_drops   sap_drops   sap_latency sap_latency sap_latency sap_latency
    mean    min max std mean    min max std mean    min max std mean    min max std mean    min max std mean    min max std mean    min max std mean    min max std mean    min max std mean    min max std
date                                                                                                                                                                
2018-02-11  -4.282442748    0   -17 -4.361148065    -13.61068702    0   -27 -6.123815451    -3.891450382    0   -47.62  -6.426298507    -1.526717557    0   -100    -12.30842628    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
2018-02-12  5.726315789 0   21  2.938315053 22.30526316 0   23  3.581474037 6.06    0   44.75   6.798944285 0.5263157895    0   100 7.254762501 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

但它并没有减去任何东西,我也得到了没有任何意义的日期,因为我只想要统计数据差异。

{{1}}

正如你所看到的,根本没有做任何减法。为什么会发生这种情况呢?

PS我最终想知道这两个日期的统计数据之间的百分比差异。有没有直接的方法呢?

1 个答案:

答案 0 :(得分:1)

获得差异

df_next - df_prev.values

要获得%更改,

(df_next - df_prev.values)/(df_prev.values) * 100