我有两个数据框,一个有作者和他们的文本 - 还有其他专栏 - 另一个有作者及其性别和学科。
DF1
====================================
author date text
------------------------------------
a1 2006 "Thank you for..."
a2 2007 "When I was asked..."
a3 2014 "Biology is the ..."
a2 2010 "In the intervening..."
DF2
====================================
author gender discipline
------------------------------------
a2 male psychologist
a1 female neurologist
a3 female biologist
我正在浏览pandas
文档并搜索SO和其他网站,试图了解我如何将DF1中的作者与他们在DF2中的性别相匹配。如果我在DF1中进行现场操作,或者我需要创建新的数据帧,我不在乎,只要新数据框包含DF1中的所有信息以及来自DF2的其他信息,性别和/或纪律
我在这里甚至没有代码的开头 - 我刚刚完成了对各种unicode错误的DF2擦除,所以我在这一天的结束时有点结束
答案 0 :(得分:1)
选项1
pd.DataFrame.merge
DF1.merge(DF2[['author', 'gender']], 'left')
author date text gender
0 a1 2006 "Thank you for..." female
1 a2 2007 "When I was asked..." male
2 a3 2014 "Biology is the ..." female
3 a2 2010 "In the intervening..." male
选项2
pd.Series.map
d = dict(DF2[['author', 'gender']].values)
DF1.assign(gender=DF1.author.map(d))
author date text gender
0 a1 2006 "Thank you for..." female
1 a2 2007 "When I was asked..." male
2 a3 2014 "Biology is the ..." female
3 a2 2010 "In the intervening..." male
选项2.1
制作d
d = DF2.set_index('author').gender
DF1.assign(gender=DF1.author.map(d))
author date text gender
0 a1 2006 "Thank you for..." female
1 a2 2007 "When I was asked..." male
2 a3 2014 "Biology is the ..." female
3 a2 2010 "In the intervening..." male
选项2.2
制作d
d = dict(zip(DF2.author, DF2.gender))
DF1.assign(gender=DF1.author.map(d))
author date text gender
0 a1 2006 "Thank you for..." female
1 a2 2007 "When I was asked..." male
2 a3 2014 "Biology is the ..." female
3 a2 2010 "In the intervening..." male
选项3
pd.DataFrame.join
DF1.join(DF2.set_index('author').gender, on='author')
author date text gender
0 a1 2006 "Thank you for..." female
1 a2 2007 "When I was asked..." male
2 a3 2014 "Biology is the ..." female
3 a2 2010 "In the intervening..." male
答案 1 :(得分:1)
import pandas as pd
df = pd.DataFrame({'author':['a1','a2','a3','a2'],
'date':[2006,2007,2014,2010],
'text':["Thank you for","when i was asked","i m the biology","in the intervening"]})
df2 = pd.DataFrame({'author':['a2','a1','a3'],
'gender':['male','female','female'],
'disciple':['pyshologist','neurologist','biologist']})
print(pd.merge(df,df2, on = 'author'))