这是我正在使用的DataFrame的示例:
import pandas as pd
import numpy as np
from scipy.stats import zscore
df = pd.DataFrame(
index=pd.MultiIndex.from_tuples(
[('Monday', '2019-11-04'),('Monday', '2019-11-11'), ('Monday', '2019-11-18'),
('Tuesday', '2019-11-05'), ('Tuesday', '2019-11-12'), ('Tuesday', '2019-11-19'),
('Wednesday', '2019-11-06'), ('Wednesday', '2019-11-13'), ('Wednesday', '2019-11-20'),
( 'Thursday', '2019-11-07'), ('Thursday', '2019-11-14'), ('Thursday', '2019-11-21'),
('Friday', '2019-11-01'), ('Friday', '2019-11-08'), ('Friday', '2019-11-15'),
('Saturday', '2019-11-02'), ('Saturday', '2019-11-09'), ('Saturday', '2019-11-16'),
('Sunday', '2019-11-03'), ('Sunday', '2019-11-10'), ('Sunday', '2019-11-17')]),
data={'A': [363287, 348759, 295711, 346276, 350785, 292794, 328048, 315418,
303901, 324330, 302850, 308500, 415665, 324196, 289739, 444184,
361214, 359573, 436543, 375668, 379184],
'B': [263641, 293827, 272811, 267064, 307886, 269061, 266336, 292442,
273714, 268377, 278113, 270378, 268556, 274989, 268869, 312046,
321059, 322694, 323546, 332234, 333341],
'C': [263678, 293870, 272855, 267092, 307931, 269114, 266378, 292488,
273769, 268426, 278156, 270422, 268602, 275021, 268906, 312084,
321116, 322741, 323602, 332298, 333405]})
现在,我通过使用for循环将scipy.stats.zscore
应用于每列来获取每列中每个值的zscore:
for col in df.columns:
df[col] = zscore(df[col])
在应用zscore函数时,不必考虑每列中的所有数字,而是在应用该函数之前如何按索引的第一级(星期几)分组?例如,我想先将函数应用于df.loc[('Monday'), 'A']
中的值,然后再应用于df.loc[('Tuesday'), 'A']
中的值,依此类推。
还有一种方法可以执行此操作,而不涉及将DataFrame的子集附加到列表,然后在处理它们之后将它们串联。
谢谢!
答案 0 :(得分:0)
df.groupby(level=0)['A','B','C'].transform(zscore)
# A B C
#weekdays dates
#Monday 2019-11-04 0.942314 -1.038220 -1.038401
# 2019-11-11 0.442097 1.350720 1.350641
# 2019-11-18 -1.384411 -0.312500 -0.312240
#Tuesday 2019-11-05 0.619782 -0.759579 -0.760220
# 2019-11-12 0.790974 1.412882 1.412849
# 2019-11-19 -1.410756 -0.653303 -0.652628
#Wednesday 2019-11-06 1.243122 -1.015742 -1.016228
# 2019-11-13 -0.037621 1.360045 1.359854
# 2019-11-20 -1.205501 -0.344304 -0.343626
#Thursday 2019-11-07 1.367941 -0.931907 -0.931481
# 2019-11-14 -0.994700 1.387182 1.387292
# 2019-11-21 -0.373242 -0.455275 -0.455811
#Friday 2019-11-01 1.363756 -0.759293 -0.757889
# 2019-11-08 -0.357646 1.412897 1.412967
# 2019-11-15 -1.006110 -0.653604 -0.655078
#Saturday 2019-11-02 1.414010 -1.399768 -1.399981
# 2019-11-09 -0.686236 0.525278 0.526673
# 2019-11-16 -0.727775 0.874490 0.873309
#Sunday 2019-11-03 1.412341 -1.406665 -1.406678
# 2019-11-10 -0.769170 0.576959 0.577073
# 2019-11-17 -0.643171 0.829706 0.829605
该组按索引的级别= 0(星期一,星期二...)
,或者如果您想重命名索引
df = df.rename_axis(index = ['weekdays','dates'])
df.groupby('weekdays').transform(zscore)