我有一个看起来像这样的数据框(FinalDF)
id | Movie | Cast
0 The Dark Knight Christopher Nolan
1 The Dark Knight Christian Bale
2 Pulp Fiction Quentin Tarantino
3 Pulp Fiction John Travolta
4 Schindler’s List Steven Spielberg
5 Schindler’s List Liam Neeson
和电影名称在movie_cast_DF
中映射到这样的IDid | name | uuid
-------------------------
1 | The Dark Knight | m1
2 | Pulp Fiction | m2
3 | Schindler’s List | m3
4 | Christopher Nolan | d1
5 | Christian Bale | a1
6 | Quentin Tarantino | d2
7 | John Travolta | a2
8 | Steven Spielberg | d3
9 | Liam Neeson | a3
我需要在FinalDF
中映射列中的IDid | Movie | Cast | mid | cid
------------------------------------------------------------------
0 The Dark Knight Christopher Nolan m1 d1
1 The Dark Knight Christian Bale m1 a1
2 Pulp Fiction Quentin Tarantino m2 d2
3 Pulp Fiction John Travolta m2 a2
4 Schindler’s List Steven Spielberg m3 d3
5 Schindler’s List Liam Neeson m3 a3
我尝试使用以下方法:
def getID(x):
try:
return movie_cast_DF[movie_cast_DF['name'].str.contains(x.lower(), case=False)]['uuid'].values[0]
except:
return None
FinalDF['mid'] = FinalDF['Movie'].apply(getID)
FinalDF['cid'] = FinalDF['Cast'].apply(getID)
FinalDF.head()
是否有任何有效且更快速的方法来进行映射?
答案 0 :(得分:1)
首先,将name
设置为df2
的索引。
dfmap = df2.set_index("name").uuid
dfmap
name
The Dark Knight m1
Pulp Fiction m2
Schindler’s List m3
Christopher Nolan d1
Christian Bale a1
Quentin Tarantino d2
John Travolta a2
Steven Spielberg d3
Liam Neeson a3
Name: uuid, dtype: object
我们将使用此系列对象将键映射到df
中的值。接下来,请致电map
/ replace
两次 -
df['mid'] = df.Movie.map(dfmap)
df['cid'] = df.Cast.map(dfmap)
df
Movie Cast mid cid
id
0 The Dark Knight Christopher Nolan m1 d1
1 The Dark Knight Christian Bale m1 a1
2 Pulp Fiction Quentin Tarantino m2 d2
3 Pulp Fiction John Travolta m2 a2
4 Schindler’s List Steven Spielberg m3 d3