我有两个数据框,如下所示
df1-检查员ID和分配的地点
df1:
Inspector_ID Assigned_Place
1 ['Bangalore', 'Chennai']
2 ['Bangalore', 'Delhi', 'Chennai']
3 ['Bangalore', 'Delhi']
4 ['Chennai', 'Mumbai']
df2-检查员在每个地方提出的票数 df2:
Inpector_ID Place Tickets
1 Bangalore 20
1 Mumbai 4
2 Bangalore 40
2 Delhi 4
3 Delhi 20
3 Mumbai 10
4 Chennai 20
4 Mumbai 8
我要根据上面的数据框生成下面的数据框。
Inpector_ID Place Tickets Assigned
1 Bangalore 20 Yes
1 Mumbai 4 No
1 Chennai 0 Yes
2 Bangalore 40 Yes
2 Delhi 4 Yes
2 Chennai 0 Yes
3 Delhi 20 Yes
3 Mumbai 10 No
3 Bangalore 0 Yes
4 Chennai 20 Yes
4 Mumbai 8 Yes
为问题添加更多内容
df1是2019年全年的时间表,即2019年整个月的时间表。
df2:
Inpector_ID Place Tickets YearMonth
1 Bangalore 20 201901
1 Mumbai 4 201901
2 Bangalore 40 201901
2 Delhi 4 201901
3 Delhi 20 201901
3 Mumbai 10 201901
4 Chennai 20 201901
4 Mumbai 8 201901
1 Bangalore 20 201902
1 Mumbai 4 201902
2 Bangalore 40 201902
2 Delhi 4 201902
2 Chennai 8 201902
3 Delhi 20 201902
3 Mumbai 10 201902
4 Chennai 20 201902
4 Delhi 8 201902
我想在数据框下方
预期输出:
Inpector_ID Place Tickets YearMonth Assigned
1 Bangalore 20 201901 Yes
1 Chennai 0 201901 Yes
1 Mumbai 4 201901 No
2 Bangalore 40 201901 Yes
2 Delhi 4 201901 Yes
2 Chennai 0 201901 Yes
3 Delhi 20 201901 Yes
3 Mumbai 10 201901 No
3 Bangalore 0 201901 Yes
4 Chennai 20 201901 Yes
4 Mumbai 8 201901 Yes
1 Bangalore 20 201902 Yes
1 Mumbai 4 201902 No
1 Chennai 0 201901 Yes
2 Bangalore 40 201902 Yes
2 Delhi 4 201902 Yes
2 Chennai 8 201902 Yes
3 Delhi 20 201902 Yes
3 Mumbai 10 201902 No
3 Bangalore 0 201901 Yes
4 Chennai 20 201902 Yes
4 Delhi 8 201902 No
4 Mumbai 0 201902 Yes
答案 0 :(得分:2)
首先转换由DataFrame.explode
填充列表的列,然后由外部联接和指标参数merge
并最后设置新的列名:
df1 = df1.explode('Assigned_Place').rename(columns={'Assigned_Place':'Place'})
df = (df2.merge(df1, how='outer', indicator='Assigned')
.sort_values(['Inspector_ID','Place'])
.fillna({'Tickets':0})
.assign(Assigned = lambda x: np.where(x['Assigned'].eq('left_only'), 'No', 'Yes'))
)
print (df)
Inspector_ID Place Tickets Assigned
0 1 Bangalore 20.0 Yes
8 1 Chennai 0.0 Yes
1 1 Mumbai 4.0 No
2 2 Bangalore 40.0 Yes
9 2 Chennai 0.0 Yes
3 2 Delhi 4.0 Yes
10 3 Bangalore 0.0 Yes
4 3 Delhi 20.0 Yes
5 3 Mumbai 10.0 No
6 4 Chennai 20.0 Yes
7 4 Mumbai 8.0 Yes
编辑:解决方案相似,只是通过所有唯一的YearMonth
值添加交叉联接:
df1 = df1.explode('Assigned_Place').rename(columns={'Assigned_Place':'Place'})
df11 = pd.DataFrame({'YearMonth':df2['YearMonth'].unique(), 'a':1})
df1 = df1.assign(a=1).merge(df11, on='a').drop('a', 1)
df = (df2.merge(df1, how='outer', indicator='Assigned')
.sort_values(['Inspector_ID','Place'])
.fillna({'Tickets':0})
.assign(Assigned = lambda x: np.where(x['Assigned'].eq('left_only'), 'No', 'Yes'))
)
print (df)
Inspector_ID Place Tickets YearMonth Assigned
0 1 Bangalore 20.0 201901 Yes
8 1 Bangalore 20.0 201902 Yes
17 1 Chennai 0.0 201901 Yes
18 1 Chennai 0.0 201902 Yes
1 1 Mumbai 4.0 201901 No
9 1 Mumbai 4.0 201902 No
2 2 Bangalore 40.0 201901 Yes
10 2 Bangalore 40.0 201902 Yes
12 2 Chennai 8.0 201902 Yes
19 2 Chennai 0.0 201901 Yes
3 2 Delhi 4.0 201901 Yes
11 2 Delhi 4.0 201902 Yes
20 3 Bangalore 0.0 201901 Yes
21 3 Bangalore 0.0 201902 Yes
4 3 Delhi 20.0 201901 Yes
13 3 Delhi 20.0 201902 Yes
5 3 Mumbai 10.0 201901 No
14 3 Mumbai 10.0 201902 No
6 4 Chennai 20.0 201901 Yes
15 4 Chennai 20.0 201902 Yes
16 4 Delhi 8.0 201902 No
7 4 Mumbai 8.0 201901 Yes
22 4 Mumbai 0.0 201902 Yes