我有以下Python数据框如下所示:
对于每个“地区”,我对下面列出的活动有具体标准:
a)对于区域1,我想显示前三个会议的“活动”和前两个会话的“活动”会议
b)对于2区,我想显示第一个账号为“活动”的电话和第一个账户“活动”的会议
c)对于4区,我想通过“Rank”
显示前6个账户以下是我想得到的结果数据框:
我可以使用以下代码按地区获得相同数量的会议和电话。但我不知道如何根据区域标准获得不同会议和电话的子集。
d1 = data[data['Activity'] == 'meeting'].groupby('Region')\
.apply(lambda x: x.sort_values('Rank')[:3])
d2 = data[data['Activity'] == 'call'].groupby('Region')\
.apply(lambda x: x.sort_values('Rank')[:2])
pd.concat([d1, d2])
非常感谢任何帮助!
答案 0 :(得分:0)
我将使用一种天真的方式,只需切片,然后添加到空白结果数据帧。
import pandas as pd
#Create test dataframe
a = pd.DataFrame([['A', 1, 1, 'meeting'],
['B', 1, 2, 'meeting'],
['C', 1, 3, 'meeting'],
['D', 1, 4, 'meeting'],
['E', 1, 5, 'call'],
['F', 1, 6, 'call'],
['G', 1, 7, 'call'],
['H', 2, 1, 'call'],
['I', 2, 2, 'call'],
['J', 2, 3, 'meeting'],
['K', 2, 4, 'meeting'],
['L', 2, 5, 'meeting'],
['M', 2, 6, 'meeting'],
['N', 2, 7, 'meeting'],
['O', 2, 8, 'meeting'],
['P', 4, 1, 'call'],
['Q', 4, 2, 'meeting'],
['R', 4, 3, 'call'],
['S', 4, 4, 'meeting'],
['T', 4, 5, 'call'],
['U', 4, 6, 'meeting'],
['V', 4, 7, 'call']], columns=['Account', 'Region', 'Rank', 'Activity'])
#Create blank df
result = pd.DataFrame(columns=['Account', 'Region', 'Rank', 'Activity'])
temp = a[a['Region']==1] #Slice region 1
temp = temp[temp['Activity']=='meeting'].sort_values('Rank')[:3] #Slice activity meeting then sort and get first 3
result = pd.concat([result, temp]) #Add to result df
temp = a[a['Region']==1] #Slice region 1
temp = temp[temp['Activity']=='call'].sort_values('Rank')[:2] #Slice activity call then sort and get first 2
result = pd.concat([result, temp]) #Add to result df
temp = a[a['Region']==2] #Slice region 2
temp = temp[temp['Activity']=='meeting'].sort_values('Rank')[:1] #Slice activity meeting then sort and get first one
result = pd.concat([result, temp]) #Add to result df
temp = a[a['Region']==2] #Slice region 2
temp = temp[temp['Activity']=='call'].sort_values('Rank')[:1] #Slice activity call then sort and get first 1
result = pd.concat([result, temp]) #Add to result df
temp = a[a['Region']==4] #Slice region 4
temp = temp.sort_values('Rank')[:6] #Sort then get first 6
result = pd.concat([result, temp]) #Add to result df
result['Region'] = result['Region'].apply(lambda x: int(x)) #Trim result of region and rank column
result['Rank'] = result['Rank'].apply(lambda x: int(x)) #Trim result of region and rank column
结果将是:
Account Region Rank Activity
0 A 1 1 meeting
1 B 1 2 meeting
2 C 1 3 meeting
4 E 1 5 call
5 F 1 6 call
9 J 2 3 meeting
7 H 2 1 call
15 P 4 1 call
16 Q 4 2 meeting
17 R 4 3 call
18 S 4 4 meeting
19 T 4 5 call
20 U 4 6 meeting