熊猫:对一些数据进行分组

时间:2016-07-13 12:28:51

标签: python pandas

我有数据框

         date   id
0  12-12-2015  123
1  13-12-2015  123
2  15-12-2015  123
3  16-12-2015  123
4  18-12-2015  123
5  10-12-2015  456
6  13-12-2015  456
7  15-12-2015  456

我想要

      id   date   count
0  123   10-12-2015   0
1  123   11-12-2015   0
2  123   12-12-2015   1
3  123   13-12-2015   1
4  123   14-12-2015   0
5  123   15-12-2015   1
6  123   16-12-2015   1
7  123   17-12-2015   0
8  123   18-12-2015   1
9  456   10-12-2015   1
10  456   11-12-2015   0
11  456   12-12-2015   0
12 456   13-12-2015   1
13  456   14-12-2015   0
14 456   15-12-2015   1

我之前尝试

df = df.groupby('id').resample('D').size().reset_index(name='val')

但它搜索现有的每个id之间的日期。我怎么能在一段时间内做到这一点?

1 个答案:

答案 0 :(得分:1)

您可以通过重新编制每个组的聚合索引并使用NaN填充0来实现您的目标。

import io
import pandas as pd

data = io.StringIO("""\
date   id
0  12-12-2015  123
1  13-12-2015  123
2  15-12-2015  123
3  16-12-2015  123
4  18-12-2015  123
5  10-12-2015  456
6  13-12-2015  456
7  15-12-2015  456""")
df = pd.read_csv(data, delim_whitespace=True)
df['date'] = pd.to_datetime(df['date'], format="%d-%m-%Y")

startdate = df['date'].min()
enddate = df['date'].max()
alldates = pd.date_range(startdate, enddate, freq='D', name='date')

def process_id(g):
    return g.resample('D').size().reindex(alldates).fillna(0)

output = (df.set_index('date')
            .groupby('id')
            .apply(process_id)
            .stack()
            .rename('val')
            .reset_index('id'))

print(output)

#              id  val
# date                
# 2015-12-10  123  0.0
# 2015-12-11  123  0.0
# 2015-12-12  123  1.0
# 2015-12-13  123  1.0
# 2015-12-14  123  0.0
# 2015-12-15  123  1.0
# 2015-12-16  123  1.0
# 2015-12-17  123  0.0
# 2015-12-18  123  1.0
# 2015-12-10  456  1.0
# 2015-12-11  456  0.0
# 2015-12-12  456  0.0
# 2015-12-13  456  1.0
# 2015-12-14  456  0.0
# 2015-12-15  456  1.0
# 2015-12-16  456  0.0
# 2015-12-17  456  0.0
# 2015-12-18  456  0.0