如何按时间分组数据

时间:2012-10-19 12:57:22

标签: python matplotlib pandas

我正试图找到一种每日分组数据的方法。

这是我的数据集的一个例子。

Dates                              Price1                                 Price 2

2002-10-15  11:17:03pm              0.6                                     5.0

2002-10-15  11:20:04pm              1.4                                     2.4

2002-10-15  11:22:12pm              4.1                                     9.1

2002-10-16  12:21:03pm              1.6                                     1.4

2002-10-16  12:22:03pm              7.7                                     3.7

3 个答案:

答案 0 :(得分:2)

是的,我肯定会使用Pandas。最棘手的部分就是找出用于加载数据的pandas的日期时间解析器。之后,它只是对后续DataFrame的重新采样。

In [62]: parse = lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %I:%M:%S%p')
In [63]: dframe = pandas.read_table("data.txt", delimiter=",", index_col=0, parse_dates=True, date_parser=parse)
In [64]: print dframe
                                 Price1                                   Price 2
Dates                                                                            
2002-10-15 23:17:03                                0.6                        5.0
2002-10-15 23:20:04                                1.4                        2.4
2002-10-15 23:22:12                                4.1                        9.1
2002-10-16 12:21:03                                1.6                        1.4
2002-10-16 12:22:03                                7.7                        3.7
In [78]: means = dframe.resample("D", how='mean', label='left')
In [79]: print means
                                 Price1                                   Price 2
Dates                                                                            
2002-10-15                                    2.033333                       5.50
2002-10-16                                    4.650000                       2.55

其中data.txt

Dates                 ,         Price1    ,                  Price 2
2002-10-15  11:17:03pm,          0.6      ,                    5.0
2002-10-15  11:20:04pm,          1.4      ,                    2.4
2002-10-15  11:22:12pm,          4.1      ,                    9.1
2002-10-16  12:21:03pm,          1.6      ,                    1.4
2002-10-16  12:22:03pm,          7.7      ,                    3.7

答案 1 :(得分:0)

使用

data.groupby(data['dates'].map(lambda x: x.day))

答案 2 :(得分:0)

来自pandas文档:http://pandas.pydata.org/pandas-docs/stable/pandas.pdf

 # 72 hours starting with midnight Jan 1st, 2011
 In [1073]: rng = date_range(’1/1/2011’, periods=72, freq=’H’)