我希望能够统计7天中某个位置的出现次数。我尝试了几种组合groupby,rolling和Grouper的方法,但是仍然没有得到想要的结果。如何将2列分组以得到所需的结果?
这是一个示例表:
locations = ['A', 'B', 'A', 'B', 'C','C']
df = pd.DataFrame({'date': times,'location': locations})
date location
0 2014-08-25 A
1 2014-08-26 B
2 2014-08-26 A
3 2014-09-11 B
4 2014-09-12 C
5 2014-09-15 C
我尝试过:
df.set_index('date', inplace=True)
df['roll']=df.groupby('location')['location'].rolling(7).count().reset_index(0,drop=True)
但是给我这个:
location roll
date
2014-08-25 A 1.0
2014-08-26 B 2.0
2014-08-26 A 1.0
2014-09-11 B 2.0
2014-09-12 C 1.0
2014-09-15 C 2.0
我的愿望输出应该看起来像这样...
times = pd.to_datetime(pd.Series(['2014-08-25','2014-08-26','2014-08-26','2014-09-11','2014-09-12', '2014-09-15']))
locations = ['A', 'B', 'A', 'B', 'C','C']
count = [1, 1, 2, 1, 1, 2]
df1 = pd.DataFrame({'date': times,'location': locations, 'rolling_count':count})
date location rolling_count
0 2014-08-25 A 1
1 2014-08-26 B 1
2 2014-08-26 A 2
3 2014-09-11 B 1
4 2014-09-12 C 1
5 2014-09-15 C 2
谢谢!
答案 0 :(得分:1)
欢迎使用堆栈溢出。这个问题有点模棱两可,但由于包含了数据,尝试的代码和所需的输出,因此可以提供答案。
要计算7天期间某位置的出现次数,可以通过Grouper
之类的df.groupby(Grouper(key='date', freq='7d'))
完成。
但是在滚动窗口中进行观察计数可以提供更多信息。这不是“每周”,它本身总是很难定义,在处理日历年和月时应始终避免。
一列中滚动显示的唯一观测值。所以有一些技巧是必要的:
结果是,当窗口滑过观察值时,计数增加然后减少。
import pandas as pd
print(pd.__version__)
times = ['2014-08-25', '2014-08-26', '2014-08-26', '2014-09-11', '2014-09-12', '2014-09-15', '2014-09-16']
locations = ['A', 'B', 'A', 'B', 'C','C', 'C']
df = pd.DataFrame({'date': times,'location': locations})
# multiple locations can be observed in a single day
df = df.pivot(index='date', columns='location', values='location')
# set up a datetime index
df.index = pd.to_datetime(df.index)
# normalize the days so an entire 7 day window can be rolled
df = df.resample('1d').last()
# count the number of observations in the window per location
# TODO: functional way to do this?
for col in df.columns:
df['{}_7d_observations'.format(col)] = df[col].rolling(7).count()
print(df)
产生类似的东西
location A B C A_7d_observations B_7d_observations C_7d_observations
date
2014-08-25 A NaN NaN 1.0 0.0 0.0
2014-08-26 A B NaN 2.0 1.0 0.0
...snip...
2014-08-31 NaN NaN NaN 2.0 1.0 0.0
2014-09-01 NaN NaN NaN 1.0 1.0 0.0
...snip...
2014-09-10 NaN NaN NaN 0.0 0.0 0.0
2014-09-11 NaN B NaN 0.0 1.0 0.0
2014-09-12 NaN NaN C 0.0 1.0 1.0
2014-09-13 NaN NaN NaN 0.0 1.0 1.0
2014-09-14 NaN NaN NaN 0.0 1.0 1.0
2014-09-15 NaN NaN C 0.0 1.0 2.0
2014-09-16 NaN NaN C 0.0 1.0 3.0