根据另一个df中的值填充新df列中的值

时间:2020-09-10 18:29:48

标签: python pandas dataframe

我有两个数据框: 首先:

Job = {'Name': ["Ron", "Joe", "Dan"],
        'Job': [[2000, 2001], 1998, [2000, 1999]]
        }

df = pd.DataFrame(Job, columns = ['Name', 'Job'])
  Name           Job
0  Ron  [2000, 2001]
1  Joe          1998
2  Dan  [2000, 1999]

第二:

Empty = {'Name': ["Ron", "Ron", "Ron", "Ron", "Joe", "Joe", "Joe", "Joe", "Dan", "Dan", "Dan", "Dan"],
        'Year': [1998, 1999, 2000, 2001, 1998, 1999, 2000, 2001, 1998, 1999, 2000, 2001]
        }

df2 = pd.DataFrame(Empty, columns = ['Name', 'Year'])

    Name Year
0   Ron 1998
1   Ron 1999
2   Ron 2000
3   Ron 2001
4   Joe 1998
5   Joe 1999
6   Joe 2000
7   Joe 2001
8   Dan 1998
9   Dan 1999
10  Dan 2000
11  Dan 2001

我想在df2中添加一列(我们称其为'job_status'),其中与df1中的名称相关联的每年将在df2中获得1,否则获得0。这应该是输出:

   Name  Year   job_status
0   Ron 1998      0
1   Ron 1999      0
2   Ron 2000      1
3   Ron 2001      1
4   Joe 1998      1
5   Joe 1999      0
6   Joe 2000      0
7   Joe 2001      0
8   Dan 1998      0
9   Dan 1999      1
10  Dan 2000      1
11  Dan 2001      0

我该怎么做?

1 个答案:

答案 0 :(得分:0)

首先explode上的df数据帧Job,然后将其与df2合并,最后使用Series.notna + view[0, 1]job_status

d = df2.merge(df.explode('Job'), left_on=['Name', 'Year'], right_on=['Name', 'Job'], how='left')
d['job_status'] = d.pop('Job').notna().view('i1')

结果:

print(d)

   Name  Year  job_status
0   Ron  1998           0
1   Ron  1999           0
2   Ron  2000           1
3   Ron  2001           1
4   Joe  1998           1
5   Joe  1999           0
6   Joe  2000           0
7   Joe  2001           0
8   Dan  1998           0
9   Dan  1999           1
10  Dan  2000           1
11  Dan  2001           0