我有以下格式的熊猫系列:
23-08-2018 06803fe4-a504-4520-8304-a76a3adcd488 0
23-08-2018 89efbfda-6edc-45a9-a0dd-e520fd8a3e2a 2839
23-08-2018 88f7ff3f-ede7-4dd0-bce9-0d25b598004d 2639
23-08-2018 10f049cb-c165-424a-b2db-99637cc2668c 0
23-08-2018 11a7ec38-a535-4f1c-acc4-c93471401dbd 0
23-08-2018 1292f360-41e5-463e-8547-002858ac0226 0
23-08-2018 145d17c0-9711-4445-8eed-7e7d35f0f896 0
23-08-2018 188d7578-8a3b-4fe5-807a-d098bce1d227 0
23-08-2018 89tfbfda-6edc-45a9-a0dd-e520fd8a3e2a 0
24-08-2018 000a7843-432f-4c67-9d7c-5d3e2ffac439 14000
24-08-2018 000cd8c7-94c7-4cb7-ad70-a60aec275f31 14655
24-08-2018 000dd787-ab81-40a7-a036-a05e4d11fea9 15655
24-08-2018 00115f12-4a50-4412-bc90-940a21a1af65 14655
24-08-2018 0012467d-53c1-4b5e-be8b-fc285d130968 17700
27-08-2018 1e806edd-1c96-4bdb-87b8-b01cb09cdb02 15
27-08-2018 d2c45e73-d5ca-4e28-ba54-4e24b4ee9be3 30
我想要一个具有以下结构的数据框:
Date AverageInteractionTime
23-08-2018 608.67 // (0+2839+2639+0+0+0+0+0+0)/9
24-08-2018 15333 // (14000+14655+15655+14655+17700)/5
27-08-2018 22.5 // (15+30)/2
基本上,我想根据聊天日期对系列进行分组,并获得第三列的平均值
我该怎么做?
答案 0 :(得分:2)
假设您的3列为['Date','Some_ID','AverageInteractionTime']
,然后在groupby
上使用Date
,并在mean
上使用AverageInteractionTime
为:
df.groupby('Date',as_index=False)['AverageInteractionTime'].mean()
Date AverageInteractionTime
0 23-08-2018 608.666667
1 24-08-2018 15333.000000
2 27-08-2018 22.500000
用于将Series
转换为所需的Dataframe
,然后使用上面的代码:
print(s[:3])
0 23-08-2018 06803fe4-a504-4520-8304-a76a3adcd488 0
1 23-08-2018 89efbfda-6edc-45a9-a0dd-e520fd8a3e2...
2 23-08-2018 88f7ff3f-ede7-4dd0-bce9-0d25b598004...
df = s.str.split(' ',expand=True).rename(columns={0:'Date',1:'Some_ID',2:'AverageInteractionTime'})
答案 1 :(得分:2)
如果MultiIndex Series
仅将mean
与level=0
参数一起使用:
print (s.index)
MultiIndex(levels=[['23-08-2018', '24-08-2018', '27-08-2018'], ['000a7843-432f-4c67-9d7c-5d3e2ffac439', '000cd8c7-94c7-4cb7-ad70-a60aec275f31', '000dd787-ab81-40a7-a036-a05e4d11fea9', '00115f12-4a50-4412-bc90-940a21a1af65', '0012467d-53c1-4b5e-be8b-fc285d130968', '06803fe4-a504-4520-8304-a76a3adcd488', '10f049cb-c165-424a-b2db-99637cc2668c', '11a7ec38-a535-4f1c-acc4-c93471401dbd', '1292f360-41e5-463e-8547-002858ac0226', '145d17c0-9711-4445-8eed-7e7d35f0f896', '188d7578-8a3b-4fe5-807a-d098bce1d227', '1e806edd-1c96-4bdb-87b8-b01cb09cdb02', '88f7ff3f-ede7-4dd0-bce9-0d25b598004d', '89efbfda-6edc-45a9-a0dd-e520fd8a3e2a', '89tfbfda-6edc-45a9-a0dd-e520fd8a3e2a', 'd2c45e73-d5ca-4e28-ba54-4e24b4ee9be3']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2], [5, 13, 12, 6, 7, 8, 9, 10, 14, 0, 1, 2, 3, 4, 11, 15]],
names=['Date', 'Val'])
df = s.mean(level=0).reset_index(name='AverageInteractionTime')
#alternative solution
#df = s.groupby(level=0).mean().reset_index(name='AverageInteractionTime')
print (df)
Date AverageInteractionTime
0 23-08-2018 608.666667
1 24-08-2018 15333.000000
2 27-08-2018 22.500000
如有必要,将索引拆分为MultiIndex
:
print (s.index[:3])
Index(['23-08-2018 06803fe4-a504-4520-8304-a76a3adcd488',
'23-08-2018 89efbfda-6edc-45a9-a0dd-e520fd8a3e2a',
'23-08-2018 88f7ff3f-ede7-4dd0-bce9-0d25b598004d'],
dtype='object', name='Date')
s.index = s.index.str.split(expand=True)
df = s.mean(level=0).reset_index(name='AverageInteractionTime')
#alternative solution
#df = s.groupby(level=0).mean().reset_index(name='AverageInteractionTime')