我有两个不同列名的数据框,每行有10行。我要做的是比较列值,如果匹配,则将电子邮件地址从df2复制到df1。我看过这个例子,但我的列名不同How to join (merge) data frames (inner, outer, left, right)?。我见过this example以及np.where
,其中使用了多个条件但是当我这样做时它会给我以下错误:
ValueError: Wrong number of items passed 2, placement implies 1
我想做什么:
我想要做的是将df1的第一行2列(first,last_huge)与df2列的所有行(first_small,last_small)进行比较,如果找到匹配,则从df2中的该特定列获取电子邮件地址并分配它到df1中的新列。任何人都可以帮我解决这个问题我只复制了下面的相关代码,并且只是在new_email中添加了5条新记录,而且还没有完全正常工作。
最初我做的是将df1 ['first']与df2 ['first']进行比较
data1 = {"first":["alice", "bob", "carol"],
"last_huge":["foo", "bar", "baz"],
"street_huge": ["Jaifo Road", "Wetib Ridge", "Ucagi View"],
"city_huge": ["Egviniw", "Manbaali", "Ismazdan"],
"age_huge": ["23", "30", "36"],
"state_huge": ["MA", "LA", "CA"],
"zip_huge": ["89899", "78788", "58999"]}
df1 = pd.DataFrame(data1)
data2 = {"first_small":["alice", "bob", "carol"],
"last_small":["foo", "bar", "baz"],
"street_small": ["Jsdffo Road", "sdf Ridge", "sdfff View"],
"city_huge": ["paris", "london", "rome"],
"age_huge": ["28", "40", "56"],
"state_huge": ["GA", "EA", "BA"],
"zip_huge": ["89859", "78728", "56999"],
"email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"],
"dob": ["31051989", "31051980", "31051981"],
"country": ["UK", "US", "IT"],
"company": ["microsoft", "apple", "google"],
"source": ["bing", "yahoo", "google"]}
df2 = pd.DataFrame(data2)
df1['new_email'] = np.where((df1[['first']] == df2[['first_small']]), df2[['email_small']], np.nan)
现在它只向new_email添加了5条记录,其余的都是nan。并告诉我这个错误:
ValueError: Can only compare identically-labeled Series objects
答案 0 :(得分:2)
尝试merge
:
(df1.merge(df2[["first_small", "last_small", "email_small"]],
how="left",
left_on=["first", "last_huge"],
right_on=["first_small", "last_small"])
.drop(['first_small','last_small'], 1))
示例:
data1 = {"first":["alice", "bob", "carol"],
"last_huge":["foo", "bar", "baz"]}
df1 = pd.DataFrame(data1)
data2 = {"first_small":["alice", "bob", "carol"],
"last_small":["foo", "bar", "baz"],
"email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"]}
df2 = pd.DataFrame(data2)
(df1.merge(df2[["first_small", "last_small", "email_small"]],
how="left",
left_on=["first", "last_huge"],
right_on=["first_small", "last_small"])
.drop(['first_small','last_small'], 1))
输出:
first last_huge email_small
0 alice foo alice@xyz.com
1 bob bar bob@abc.com
2 carol baz carol@jkl.com
答案 1 :(得分:2)
使用andrew_reece的示例数据:-) pd.concat
pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index().dropna()
Out[23]:
first last_huge email_small
0 alice foo alice@xyz.com
1 bob bar bob@abc.com
2 carol baz carol@jkl.com
使用您的数据
pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index()
Out[97]:
first last_huge age_huge city_huge state_huge street_huge zip_huge \
0 alice foo 23 Egviniw MA Jaifo Road 89899
1 bob bar 30 Manbaali LA Wetib Ridge 78788
2 carol baz 36 Ismazdan CA Ucagi View 58999
email_small
0 alice@xyz.com
1 bob@abc.com
2 carol@jkl.com
使用map
df1['email_small']=(df1['first']+df1['last_huge']).map(df2.set_index(df2['first_small']+df2['last_small'])['email_small'])
df1
Out[115]:
age_huge city_huge first last_huge state_huge street_huge zip_huge \
0 23 Egviniw alice foo MA Jaifo Road 89899
1 30 Manbaali bob bar LA Wetib Ridge 78788
2 36 Ismazdan carol baz CA Ucagi View 58999
email_small
0 alice@xyz.com
1 bob@abc.com
2 carol@jkl.com