这是我的数据框的样子:
user_id time hour weekday location
updated_at
2019-09-02 05:29:00 29279 5:29:35 5 0 A
2019-09-02 05:29:00 29279 5:29:39 5 0 A
2019-09-02 05:29:00 29279 5:29:42 5 0 A
2019-09-02 05:29:00 29279 5:29:49 5 0 B
2019-09-02 05:32:00 29279 5:32:28 5 0 C
每天我想要每个位置每小时的行总和
*想要实现类似df.groupby(["month-day hour", "location]).count()
现在,我已经创建了一个附加列,该列与月-日小时相关联
user_id time hour weekday location date-hour
updated_at
2019-09-02 05:29:00 29279 5:29:35 5 0 A 9-2 5
2019-09-02 05:29:00 29279 5:29:39 5 0 A 9-2 5
2019-09-02 05:29:00 29279 5:29:42 5 0 A 9-2 5
2019-09-02 05:29:00 29279 5:29:49 5 0 B 9-2 5
2019-09-02 05:32:00 29279 5:32:28 5 0 C 9-2 5
然后使用似乎可以完成工作的df.groupby(["date-hour", "location]).count()
,但是由于现在索引是“月日小时”格式,因此我无法利用datetimeindex。
如果无法实现*,如何更改“月日小时”格式以更正日期时间。
当我尝试pd.to_datetime("9-10 11")
时,将11视为给我Timestamp('2011-09-10 00:00:00')
答案 0 :(得分:1)
只需从datetime对象中删除分钟和秒信息。应该这样做:
数据
public static AccountManager Create()
{
string mode = ConfigurationManager.AppSettings["Mode"].ToString();
switch (mode)
{
case "FreeTest":
return new AccountManager(new FileAccountRepository(mode), new FreeAccountTestRepository());
case "BasicTest":
return new AccountManager(new FileAccountRepository(mode), new BasicAccountTestRepository());
case "PremiumTest":
return new AccountManager(new FileAccountRepository(mode), new PremiumAccountTestRepository());
default:
throw new Exception("Mode value in app config is not valid");
}
}
解决方案
df = pd.DataFrame([['2019-09-02 05:29:00', '29279', 'A'],
['2019-09-02 05:29:00', '29279', 'A'],
['2019-09-02 05:29:00', '29279', 'A'],
['2019-09-02 05:29:00', '29279', 'B'],
['2019-09-02 05:32:00', '29279', 'C']], columns = ['datetime', 'user_id', 'location'])
df['datetime'] = pd.to_datetime(df['datetime'])
print(df.to_string())
datetime user_id location
0 2019-09-02 05:29:00 29279 A
1 2019-09-02 05:29:00 29279 A
2 2019-09-02 05:29:00 29279 A
3 2019-09-02 05:29:00 29279 B
4 2019-09-02 05:32:00 29279 C
输出
df['time_hour'] = df['datetime'].map(lambda x: x.replace(minute=0, second=0))
答案 1 :(得分:1)
我相信您只需要df.index.floor('H')
和location
的分组方式
df_out = (df.groupby([df.index.floor('H'), 'location']).location.count()
.reset_index(1, name='count'))
Out[311]:
location count
updated_at
2019-09-02 05:00:00 A 3
2019-09-02 05:00:00 B 1
2019-09-02 05:00:00 C 1