Question

我有一个来自CSV的数据框，如下所示（此处的示例数据为：http://www.speedyshare.com/9A2zf/download/sample.csv）：

                          event    name          user  count  amount  commission
2011-05-23 00:00:00  2011-07-22  normal  reading_arts      2      26         0.0
2011-05-23 00:00:00  2011-07-23  normal  reading_arts     14     182         0.0
2011-05-24 00:00:00  2011-07-22  normal  reading_arts      4      52         0.0
2011-05-24 00:00:00  2011-07-22  normal  reading_arts      3      39         0.0
2011-05-26 00:00:00  2011-07-23  normal  reading_arts      2      30         0.0
2011-05-26 00:00:00  2011-07-23  normal  reading_arts      5      75         0.0
2011-05-26 00:00:00  2011-07-22  normal  reading_arts      1      13         0.0
2011-05-27 15:39:28  2011-07-23  normal       hickies     16     208       -10.4
2011-06-01 00:00:00  2011-07-23  normal  reading_arts      2      30         0.0
2011-06-02 00:00:00  2011-07-23  normal  reading_arts     17     221         0.0

..我创建的：

data = read_csv('2011.csv', 
                names=('event', 'user', 'count', 'amount', 'commission'), 
                parse_dates=True)

＆＃39; event＆＃39;虽然看起来像日期，但实际上只是特定事件的标识符。

您会注意到DateTimeIndex中有重复的条目，例如：2011-05-23 00:00:00。

我最终想要的是每个事件的每个用户的3个时间序列（每个计数，金额和佣金），通过总结下采样到每周桶。我还想为每个事件创建类似的时间序列，这只是每个用户每个事件时间序列的总和。

我该怎么做？

Answer 1

编辑 - 试试这段代码：

注意 - 我拿了csv并为每一行添加了一个标题。我添加的第1行列标题是：

time    event   name    user    count   amount  commission

尝试运行此操作并告诉我它是否仍然不是您正在寻找的内容。

import pandas as pd
import numpy as np

df= pd.DataFrame.from_csv('sample.csv')

resamp = df.groupby(['event','user']).resample('W', how='sum')

使用pandas从规范化的csv中提取下采样的时间序列

1 个答案: