熊猫转换时间戳和月度总结

时间:2018-05-13 17:07:26

标签: python pandas numpy time

我有几个.csv文件,我通过Pandas导入,然后计算出数据摘要(最小值,最大值,平均值),理想情况是每周和每月报告。我有以下代码,但似乎没有让月份摘要工作,我确定问题是时间戳转换。

我做错了什么?

txt <- 
  '1 2:1.827411e-02 3:5.355330e-02 4:1.827411e-02 5:1.827411e-02
2 1:1.827411e-02 3:1.903553e-02 4:4.568528e-03 5:4.568528e-03
3 1:5.355330e-02 2:1.903553e-02 4:1.903553e-02 5:1.903553e-02 6:7.461929e-02 11:3.350254e-02
4 1:1.827411e-02 2:4.568528e-03 3:1.903553e-02 5:4.568528e-03
5 1:1.827411e-02 2:4.568528e-03 3:1.903553e-02 4:4.568528e-03
6 3:7.461929e-02 7:1.903553e-02 8:1.903553e-02 9:5.355330e-02 10:1.903553e-02 11:3.350254e-02
7 6:1.903553e-02 8:4.568528e-03 9:1.827411e-02 10:4.568528e-03
8 6:1.903553e-02 7:4.568528e-03 9:1.827411e-02 10:4.568528e-03
9 6:5.355330e-02 7:1.827411e-02 8:1.827411e-02 10:1.827411e-02
10 6:1.903553e-02 7:4.568528e-03 8:4.568528e-03 9:1.827411e-02
11 3:3.350254e-02 6:3.350254e-02'

r <- readLines(textConnection(txt))

1 个答案:

答案 0 :(得分:0)

IIUC,你几乎拥有它,你的日期时间转换很好。这是一个例子:

从像这样的数据框开始(这是您的示例行,稍作修改重复):

>>> df
                        time      x       y     z     a      b       c      d
0  2017-05-11 18:29:14+00:00  264.0  947.99  24.5  53.7  511.0  11.463  12.31
1  2017-05-15 18:29:14+00:00  265.0  957.99  25.5  43.7  512.0  11.563  22.31
2  2017-05-21 18:29:14+00:00  266.0  967.99  26.5  33.7  513.0  11.663  32.31
3  2017-06-11 18:29:14+00:00  267.0  977.99  26.5  23.7  514.0  11.763  42.31
4  2017-06-22 18:29:14+00:00  268.0  997.99  27.5  13.7  515.0  11.800  52.31

您可以执行日期前所做的事情:

df['timestamp'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')

然后单独获取摘要:

monthly_mean = df.groupby(pd.Grouper(key='timestamp',freq='M')).mean()
monthly_max = df.groupby(pd.Grouper(key='timestamp',freq='M')).max()
monthly_min = df.groupby(pd.Grouper(key='timestamp',freq='M')).min()

weekly_mean = df.groupby(pd.Grouper(key='timestamp',freq='W')).mean()
weekly_min = df.groupby(pd.Grouper(key='timestamp',freq='W')).min()
weekly_max = df.groupby(pd.Grouper(key='timestamp',freq='W')).max()

# Examples:
>>> monthly_mean
                x       y     z     a      b        c      d
timestamp                                                   
2017-05-31  265.0  957.99  25.5  43.7  512.0  11.5630  22.31
2017-06-30  267.5  987.99  27.0  18.7  514.5  11.7815  47.31

>>> weekly_mean
                x       y     z     a      b       c      d
timestamp                                                  
2017-05-14  264.0  947.99  24.5  53.7  511.0  11.463  12.31
2017-05-21  265.5  962.99  26.0  38.7  512.5  11.613  27.31
2017-05-28    NaN     NaN   NaN   NaN    NaN     NaN    NaN
2017-06-04    NaN     NaN   NaN   NaN    NaN     NaN    NaN
2017-06-11  267.0  977.99  26.5  23.7  514.0  11.763  42.31
2017-06-18    NaN     NaN   NaN   NaN    NaN     NaN    NaN
2017-06-25  268.0  997.99  27.5  13.7  515.0  11.800  52.31

或者将它们聚合在一起以获得带有摘要的多索引数据框:

monthly_summary = df.groupby(pd.Grouper(key='timestamp',freq='M')).agg(['mean', 'min', 'max'])
weekly_summary = df.groupby(pd.Grouper(key='timestamp',freq='W')).agg(['mean', 'min', 'max'])

# Example of summary of row 'x':
>>> monthly_summary['x']
             mean    min    max
timestamp                      
2017-05-31  265.0  264.0  266.0
2017-06-30  267.5  267.0  268.0

>>> weekly_summary['x']
             mean    min    max
timestamp                      
2017-05-14  264.0  264.0  264.0
2017-05-21  265.5  265.0  266.0
2017-05-28    NaN    NaN    NaN
2017-06-04    NaN    NaN    NaN
2017-06-11  267.0  267.0  267.0
2017-06-18    NaN    NaN    NaN
2017-06-25  268.0  268.0  268.0