这是我的熊猫数据框的样子:
sampling_time MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL MQ2_CH4 MQ2_H2 MQ2_PROPANE
0 2018-07-15 08:41:49.028 4.41 32.87 19.12 7.70 10.29 7.59 4.49
1 2018-07-15 08:41:49.028 2.98 19.08 12.47 4.72 6.34 5.15 3.02
2 2018-07-15 08:41:49.028 2.73 16.88 11.33 4.22 5.69 4.72 2.76
3 2018-07-15 08:41:49.028 2.69 16.47 11.11 4.13 5.57 4.64 2.71
4 2018-07-15 08:41:49.028 2.66 16.26 11.00 4.09 5.50 4.60 2.69
当我按分组方式(拆分应用合并方法)时,我的采样时间列已删除。
transformed = dataframe.groupby('sampling_time').transform(lambda x: (x - x.mean()) / x.std())
transformed.head()
MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL MQ2_CH4 MQ2_H2 MQ2_PROPANE
0 15.710127 15.975636 15.773724 15.876433 15.874190 15.694674
1 3.519619 3.313661 3.494836 3.408578 3.404160 3.563717
2 1.388411 1.293621 1.389884 1.316656 1.352130 1.425885
3 1.047418 0.917159 0.983665 0.940110 0.973294 1.028148
4 0.791673 0.724337 0.780556 0.772756 0.752306 0.829280
对于如何保留采样时间列的任何帮助或建议,我们将不胜感激。
答案 0 :(得分:0)
您可以通过在索引中设置'sampling_time'来执行此操作,然后在使用transform运行groupby时,将使用索引来获取转换列。
df1 = df.set_index('sampling_time')
df1.groupby('sampling_time').transform(lambda x: x-x.std())
输出:
MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL \
sampling_time
2018-07-15 08:41:49.028 3.663522 25.760508 15.652432 6.154209
2018-07-15 08:41:49.028 2.233522 11.970508 9.002432 3.174209
2018-07-15 08:41:49.028 1.983522 9.770508 7.862432 2.674209
2018-07-15 08:41:49.028 1.943522 9.360508 7.642432 2.584209
2018-07-15 08:41:49.028 1.913522 9.150508 7.532432 2.544209
MQ2_CH4 MQ2_H2 MQ2_PROPANE
sampling_time
2018-07-15 08:41:49.028 8.243523 6.313227 3.7205
2018-07-15 08:41:49.028 4.293523 3.873227 2.2505
2018-07-15 08:41:49.028 3.643523 3.443227 1.9905
2018-07-15 08:41:49.028 3.523523 3.363227 1.9405
2018-07-15 08:41:49.028 3.453523 3.323227 1.9205