根据熊猫中的特定条件合并两个数据框

时间:2020-01-07 12:31:55

标签: pandas merge pandas-groupby

我有两个数据框,如下所示

df1-检查员ID和分配的地点

df1:

Inspector_ID    Assigned_Place
1               ['Bangalore', 'Chennai']
2               ['Bangalore', 'Delhi', 'Chennai']
3               ['Bangalore', 'Delhi']
4               ['Chennai', 'Mumbai']

df2-检查员在每个地方提出的票数 df2:

Inpector_ID     Place        Tickets     
1               Bangalore    20           
1               Mumbai       4            
2               Bangalore    40           
2               Delhi        4            
3               Delhi        20           
3               Mumbai       10           
4               Chennai      20           
4               Mumbai       8      

我要根据上面的数据框生成下面的数据框。

Inpector_ID     Place        Tickets      Assigned
1               Bangalore    20           Yes
1               Mumbai       4            No
1               Chennai      0            Yes
2               Bangalore    40           Yes
2               Delhi        4            Yes
2               Chennai      0            Yes
3               Delhi        20           Yes
3               Mumbai       10           No
3               Bangalore    0            Yes
4               Chennai      20           Yes
4               Mumbai       8            Yes

为问题添加更多内容

df1是2019年全年的时间表,即2019年整个月的时间表。

df2:

Inpector_ID     Place        Tickets     YearMonth
    1           Bangalore    20          201901 
    1           Mumbai       4           201901     
    2           Bangalore    40          201901       
    2           Delhi        4           201901       
    3           Delhi        20          201901      
    3           Mumbai       10          201901         
    4           Chennai      20          201901       
    4           Mumbai       8           201901
    1           Bangalore    20          201902 
    1           Mumbai       4           201902     
    2           Bangalore    40          201902       
    2           Delhi        4           201902
    2           Chennai      8           201902       
    3           Delhi        20          201902      
    3           Mumbai       10          201902         
    4           Chennai      20          201902       
    4           Delhi        8           201902

我想在数据框下方

预期输出:

     Inpector_ID     Place        Tickets    YearMonth   Assigned
        1           Bangalore    20          201901      Yes
        1           Chennai      0           201901       Yes
        1           Mumbai       4           201901      No
        2           Bangalore    40          201901      Yes  
        2           Delhi        4           201901      Yes
        2           Chennai      0           201901      Yes      
        3           Delhi        20          201901      Yes
        3           Mumbai       10          201901      No
        3           Bangalore     0          201901      Yes     
        4           Chennai      20          201901      Yes 
        4           Mumbai       8           201901      Yes
        1           Bangalore    20          201902      Yes
        1           Mumbai       4           201902      No
        1           Chennai      0           201901      Yes     
        2           Bangalore    40          201902      Yes     
        2           Delhi        4           201902      Yes
        2           Chennai      8           201902      Yes    
        3           Delhi        20          201902      Yes     
        3           Mumbai       10          201902      No
        3           Bangalore     0          201901      Yes       
        4           Chennai      20          201902      Yes
        4           Delhi        8           201902      No
        4           Mumbai       0           201902      Yes

1 个答案:

答案 0 :(得分:2)

首先转换由DataFrame.explode填充列表的列,然后由外部联接和指标参数merge并最后设置新的列名:

df1 = df1.explode('Assigned_Place').rename(columns={'Assigned_Place':'Place'})

df = (df2.merge(df1, how='outer', indicator='Assigned')
         .sort_values(['Inspector_ID','Place'])
         .fillna({'Tickets':0})
         .assign(Assigned = lambda x: np.where(x['Assigned'].eq('left_only'), 'No', 'Yes'))
         )
print (df)
    Inspector_ID      Place  Tickets Assigned
0              1  Bangalore     20.0      Yes
8              1    Chennai      0.0      Yes
1              1     Mumbai      4.0       No
2              2  Bangalore     40.0      Yes
9              2    Chennai      0.0      Yes
3              2      Delhi      4.0      Yes
10             3  Bangalore      0.0      Yes
4              3      Delhi     20.0      Yes
5              3     Mumbai     10.0       No
6              4    Chennai     20.0      Yes
7              4     Mumbai      8.0      Yes

编辑:解决方案相似,只是通过所有唯一的YearMonth值添加交叉联接:

df1 = df1.explode('Assigned_Place').rename(columns={'Assigned_Place':'Place'})
df11 = pd.DataFrame({'YearMonth':df2['YearMonth'].unique(), 'a':1})
df1 = df1.assign(a=1).merge(df11, on='a').drop('a', 1)
df = (df2.merge(df1, how='outer', indicator='Assigned')
         .sort_values(['Inspector_ID','Place'])
         .fillna({'Tickets':0})
         .assign(Assigned = lambda x: np.where(x['Assigned'].eq('left_only'), 'No', 'Yes'))
         )
print (df)
    Inspector_ID      Place  Tickets  YearMonth Assigned
0              1  Bangalore     20.0     201901      Yes
8              1  Bangalore     20.0     201902      Yes
17             1    Chennai      0.0     201901      Yes
18             1    Chennai      0.0     201902      Yes
1              1     Mumbai      4.0     201901       No
9              1     Mumbai      4.0     201902       No
2              2  Bangalore     40.0     201901      Yes
10             2  Bangalore     40.0     201902      Yes
12             2    Chennai      8.0     201902      Yes
19             2    Chennai      0.0     201901      Yes
3              2      Delhi      4.0     201901      Yes
11             2      Delhi      4.0     201902      Yes
20             3  Bangalore      0.0     201901      Yes
21             3  Bangalore      0.0     201902      Yes
4              3      Delhi     20.0     201901      Yes
13             3      Delhi     20.0     201902      Yes
5              3     Mumbai     10.0     201901       No
14             3     Mumbai     10.0     201902       No
6              4    Chennai     20.0     201901      Yes
15             4    Chennai     20.0     201902      Yes
16             4      Delhi      8.0     201902       No
7              4     Mumbai      8.0     201901      Yes
22             4     Mumbai      0.0     201902      Yes