我有一个谜题。这很容易excel。但是,在pandas中,使用数据帧df:
| EventID | PictureID | Date
0 | 1 | A | 2010-01-01
1 | 2 | A | 2010-02-01
2 | 3 | A | 2010-02-15
3 | 4 | B | 2010-01-01
4 | 5 | C | 2010-02-01
5 | 6 | C | 2010-02-15
有没有办法添加一个新列,它计算相同PictureID在过去6个月内有记录事件的次数?换句话说,数据框中的行数与给定行具有相同的PictureID,并且在给定行的日期之前的六个月内具有日期。
df['PastSix'] = ???
所以输出看起来像:
| EventID | PictureID | Date | PastSix
0 | 1 | A | 2010-01-01 | 0
1 | 2 | A | 2010-02-01 | 1
2 | 3 | A | 2010-02-15 | 2
3 | 4 | B | 2010-01-01 | 0
4 | 5 | C | 2010-02-01 | 0
5 | 6 | C | 2010-02-15 | 1
答案 0 :(得分:2)
我不知道如何定义6个月,所以我使用前183天代替,基本思路是使用asof()
方法:
import pandas as pd
import numpy as np
import io
txt = u"""EventID | PictureID | Date
0 | A | 2009-07-01
1 | A | 2010-01-01
2 | A | 2010-02-01
3 | A | 2010-02-15
4 | B | 2010-01-01
5 | C | 2010-02-01
6 | C | 2010-02-15
7 | A | 2010-08-01
"""
df = pd.read_csv(io.StringIO(txt), sep=r"\s*\|\s*", parse_dates=["Date"])
def f(df):
count = pd.Series(np.arange(1, len(df)+1), index=df["Date"])
prev1day = count.index.shift(-1, freq="D")
prev6month = count.index.shift(-183, freq="D")
result = count.asof(prev1day).fillna(0).values - count.asof(prev6month).fillna(0).values
return pd.Series(result, df.index)
df["PastSix"] = df.groupby("PictureID").apply(f)
print df
输出:
EventID PictureID Date PastSix
0 0 A 2009-07-01 00:00:00 0
1 1 A 2010-01-01 00:00:00 0
2 2 A 2010-02-01 00:00:00 1
3 3 A 2010-02-15 00:00:00 2
4 4 B 2010-01-01 00:00:00 0
5 5 C 2010-02-01 00:00:00 0
6 6 C 2010-02-15 00:00:00 1
7 7 A 2010-08-01 00:00:00 2