我有几个.csv文件,我通过Pandas导入,然后计算出数据摘要(最小值,最大值,平均值),理想情况是每周和每月报告。我有以下代码,但似乎没有让月份摘要工作,我确定问题是时间戳转换。
我做错了什么?
txt <-
'1 2:1.827411e-02 3:5.355330e-02 4:1.827411e-02 5:1.827411e-02
2 1:1.827411e-02 3:1.903553e-02 4:4.568528e-03 5:4.568528e-03
3 1:5.355330e-02 2:1.903553e-02 4:1.903553e-02 5:1.903553e-02 6:7.461929e-02 11:3.350254e-02
4 1:1.827411e-02 2:4.568528e-03 3:1.903553e-02 5:4.568528e-03
5 1:1.827411e-02 2:4.568528e-03 3:1.903553e-02 4:4.568528e-03
6 3:7.461929e-02 7:1.903553e-02 8:1.903553e-02 9:5.355330e-02 10:1.903553e-02 11:3.350254e-02
7 6:1.903553e-02 8:4.568528e-03 9:1.827411e-02 10:4.568528e-03
8 6:1.903553e-02 7:4.568528e-03 9:1.827411e-02 10:4.568528e-03
9 6:5.355330e-02 7:1.827411e-02 8:1.827411e-02 10:1.827411e-02
10 6:1.903553e-02 7:4.568528e-03 8:4.568528e-03 9:1.827411e-02
11 3:3.350254e-02 6:3.350254e-02'
r <- readLines(textConnection(txt))
答案 0 :(得分:0)
IIUC,你几乎拥有它,你的日期时间转换很好。这是一个例子:
从像这样的数据框开始(这是您的示例行,稍作修改重复):
>>> df
time x y z a b c d
0 2017-05-11 18:29:14+00:00 264.0 947.99 24.5 53.7 511.0 11.463 12.31
1 2017-05-15 18:29:14+00:00 265.0 957.99 25.5 43.7 512.0 11.563 22.31
2 2017-05-21 18:29:14+00:00 266.0 967.99 26.5 33.7 513.0 11.663 32.31
3 2017-06-11 18:29:14+00:00 267.0 977.99 26.5 23.7 514.0 11.763 42.31
4 2017-06-22 18:29:14+00:00 268.0 997.99 27.5 13.7 515.0 11.800 52.31
您可以执行日期前所做的事情:
df['timestamp'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
然后单独获取摘要:
monthly_mean = df.groupby(pd.Grouper(key='timestamp',freq='M')).mean()
monthly_max = df.groupby(pd.Grouper(key='timestamp',freq='M')).max()
monthly_min = df.groupby(pd.Grouper(key='timestamp',freq='M')).min()
weekly_mean = df.groupby(pd.Grouper(key='timestamp',freq='W')).mean()
weekly_min = df.groupby(pd.Grouper(key='timestamp',freq='W')).min()
weekly_max = df.groupby(pd.Grouper(key='timestamp',freq='W')).max()
# Examples:
>>> monthly_mean
x y z a b c d
timestamp
2017-05-31 265.0 957.99 25.5 43.7 512.0 11.5630 22.31
2017-06-30 267.5 987.99 27.0 18.7 514.5 11.7815 47.31
>>> weekly_mean
x y z a b c d
timestamp
2017-05-14 264.0 947.99 24.5 53.7 511.0 11.463 12.31
2017-05-21 265.5 962.99 26.0 38.7 512.5 11.613 27.31
2017-05-28 NaN NaN NaN NaN NaN NaN NaN
2017-06-04 NaN NaN NaN NaN NaN NaN NaN
2017-06-11 267.0 977.99 26.5 23.7 514.0 11.763 42.31
2017-06-18 NaN NaN NaN NaN NaN NaN NaN
2017-06-25 268.0 997.99 27.5 13.7 515.0 11.800 52.31
或者将它们聚合在一起以获得带有摘要的多索引数据框:
monthly_summary = df.groupby(pd.Grouper(key='timestamp',freq='M')).agg(['mean', 'min', 'max'])
weekly_summary = df.groupby(pd.Grouper(key='timestamp',freq='W')).agg(['mean', 'min', 'max'])
# Example of summary of row 'x':
>>> monthly_summary['x']
mean min max
timestamp
2017-05-31 265.0 264.0 266.0
2017-06-30 267.5 267.0 268.0
>>> weekly_summary['x']
mean min max
timestamp
2017-05-14 264.0 264.0 264.0
2017-05-21 265.5 265.0 266.0
2017-05-28 NaN NaN NaN
2017-06-04 NaN NaN NaN
2017-06-11 267.0 267.0 267.0
2017-06-18 NaN NaN NaN
2017-06-25 268.0 268.0 268.0