Python DataFrame:基于3个不同字段中的值的条件子集

时间:2017-05-22 03:13:36

标签: python loops if-statement dataframe conditional

我有以下Python数据框如下所示:

enter image description here

对于每个“地区”,我对下面列出的活动有具体标准:

a)对于区域1,我想显示前三个会议的“活动”和前两个会话的“活动”会议

b)对于2区,我想显示第一个账号为“活动”的电话和第一个账户“活动”的会议

c)对于4区,我想通过“Rank”

显示前6个账户

以下是我想得到的结果数据框:

enter image description here

我可以使用以下代码按地区获得相同数量的会议和电话。但我不知道如何根据区域标准获得不同会议和电话的子集。

d1 = data[data['Activity'] == 'meeting'].groupby('Region')\
       .apply(lambda x: x.sort_values('Rank')[:3])
d2 = data[data['Activity'] == 'call'].groupby('Region')\
       .apply(lambda x: x.sort_values('Rank')[:2])    
pd.concat([d1, d2])

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:0)

我将使用一种天真的方式,只需切片,然后添加到空白结果数据帧。

import pandas as pd

#Create test dataframe
a = pd.DataFrame([['A', 1, 1, 'meeting'],
                  ['B', 1, 2, 'meeting'],
                  ['C', 1, 3, 'meeting'],
                  ['D', 1, 4, 'meeting'],
                  ['E', 1, 5, 'call'],
                  ['F', 1, 6, 'call'],
                  ['G', 1, 7, 'call'],
                  ['H', 2, 1, 'call'],
                  ['I', 2, 2, 'call'],
                  ['J', 2, 3, 'meeting'],
                  ['K', 2, 4, 'meeting'],
                  ['L', 2, 5, 'meeting'],
                  ['M', 2, 6, 'meeting'],
                  ['N', 2, 7, 'meeting'],
                  ['O', 2, 8, 'meeting'],
                  ['P', 4, 1, 'call'],
                  ['Q', 4, 2, 'meeting'],
                  ['R', 4, 3, 'call'],
                  ['S', 4, 4, 'meeting'],
                  ['T', 4, 5, 'call'],
                  ['U', 4, 6, 'meeting'],
                  ['V', 4, 7, 'call']], columns=['Account', 'Region', 'Rank', 'Activity'])


#Create blank df
result = pd.DataFrame(columns=['Account', 'Region', 'Rank', 'Activity'])

temp = a[a['Region']==1] #Slice region 1
temp = temp[temp['Activity']=='meeting'].sort_values('Rank')[:3] #Slice activity meeting then sort and get first 3

result = pd.concat([result, temp]) #Add to result df

temp = a[a['Region']==1] #Slice region 1
temp = temp[temp['Activity']=='call'].sort_values('Rank')[:2] #Slice activity call then sort and get first 2

result = pd.concat([result, temp]) #Add to result df

temp = a[a['Region']==2] #Slice region 2
temp = temp[temp['Activity']=='meeting'].sort_values('Rank')[:1] #Slice activity meeting then sort and get first one

result = pd.concat([result, temp]) #Add to result df

temp = a[a['Region']==2] #Slice region 2
temp = temp[temp['Activity']=='call'].sort_values('Rank')[:1] #Slice activity call then sort and get first 1

result = pd.concat([result, temp]) #Add to result df

temp = a[a['Region']==4] #Slice region 4
temp = temp.sort_values('Rank')[:6] #Sort then get first 6

result = pd.concat([result, temp]) #Add to result df

result['Region'] = result['Region'].apply(lambda x: int(x)) #Trim result of region and rank column
result['Rank'] = result['Rank'].apply(lambda x: int(x)) #Trim result of region and rank column

结果将是:

       Account  Region  Rank Activity
0        A       1     1  meeting
1        B       1     2  meeting
2        C       1     3  meeting
4        E       1     5     call
5        F       1     6     call
9        J       2     3  meeting
7        H       2     1     call
15       P       4     1     call
16       Q       4     2  meeting
17       R       4     3     call
18       S       4     4  meeting
19       T       4     5     call
20       U       4     6  meeting