我有两个数据框: 首先:
Job = {'Name': ["Ron", "Joe", "Dan"],
'Job': [[2000, 2001], 1998, [2000, 1999]]
}
df = pd.DataFrame(Job, columns = ['Name', 'Job'])
Name Job
0 Ron [2000, 2001]
1 Joe 1998
2 Dan [2000, 1999]
第二:
Empty = {'Name': ["Ron", "Ron", "Ron", "Ron", "Joe", "Joe", "Joe", "Joe", "Dan", "Dan", "Dan", "Dan"],
'Year': [1998, 1999, 2000, 2001, 1998, 1999, 2000, 2001, 1998, 1999, 2000, 2001]
}
df2 = pd.DataFrame(Empty, columns = ['Name', 'Year'])
Name Year
0 Ron 1998
1 Ron 1999
2 Ron 2000
3 Ron 2001
4 Joe 1998
5 Joe 1999
6 Joe 2000
7 Joe 2001
8 Dan 1998
9 Dan 1999
10 Dan 2000
11 Dan 2001
我想在df2中添加一列(我们称其为'job_status'),其中与df1中的名称相关联的每年将在df2中获得1,否则获得0。这应该是输出:
Name Year job_status
0 Ron 1998 0
1 Ron 1999 0
2 Ron 2000 1
3 Ron 2001 1
4 Joe 1998 1
5 Joe 1999 0
6 Joe 2000 0
7 Joe 2001 0
8 Dan 1998 0
9 Dan 1999 1
10 Dan 2000 1
11 Dan 2001 0
我该怎么做?
答案 0 :(得分:0)
首先explode
上的df
数据帧Job
,然后将其与df2
合并,最后使用Series.notna
+ view
从[0, 1]
至job_status
:
d = df2.merge(df.explode('Job'), left_on=['Name', 'Year'], right_on=['Name', 'Job'], how='left')
d['job_status'] = d.pop('Job').notna().view('i1')
结果:
print(d)
Name Year job_status
0 Ron 1998 0
1 Ron 1999 0
2 Ron 2000 1
3 Ron 2001 1
4 Joe 1998 1
5 Joe 1999 0
6 Joe 2000 0
7 Joe 2001 0
8 Dan 1998 0
9 Dan 1999 1
10 Dan 2000 1
11 Dan 2001 0