Question

我有两个不同列名的数据框，每行有10行。我要做的是比较列值，如果匹配，则将电子邮件地址从df2复制到df1。我看过这个例子，但我的列名不同How to join (merge) data frames (inner, outer, left, right)?。我见过this example以及np.where，其中使用了多个条件但是当我这样做时它会给我以下错误：

ValueError: Wrong number of items passed 2, placement implies 1

我想做什么：

我想要做的是将df1的第一行2列（first，last_huge）与df2列的所有行（first_small，last_small）进行比较，如果找到匹配，则从df2中的该特定列获取电子邮件地址并分配它到df1中的新列。任何人都可以帮我解决这个问题我只复制了下面的相关代码，并且只是在new_email中添加了5条新记录，而且还没有完全正常工作。

最初我做的是将df1 ['first']与df2 ['first']进行比较

data1 = {"first":["alice", "bob", "carol"],
         "last_huge":["foo", "bar", "baz"],
         "street_huge": ["Jaifo Road", "Wetib Ridge", "Ucagi View"],
         "city_huge": ["Egviniw", "Manbaali", "Ismazdan"],
         "age_huge": ["23", "30", "36"],
         "state_huge": ["MA", "LA", "CA"],
         "zip_huge": ["89899", "78788", "58999"]}

df1 = pd.DataFrame(data1)

data2 = {"first_small":["alice", "bob", "carol"],
         "last_small":["foo", "bar", "baz"],
         "street_small": ["Jsdffo Road", "sdf Ridge", "sdfff View"],
         "city_huge": ["paris", "london", "rome"],
         "age_huge": ["28", "40", "56"],
         "state_huge": ["GA", "EA", "BA"],
         "zip_huge": ["89859", "78728", "56999"],
         "email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"],
         "dob": ["31051989", "31051980", "31051981"],
         "country": ["UK", "US", "IT"],
         "company": ["microsoft", "apple", "google"],
         "source": ["bing", "yahoo", "google"]}

df2 = pd.DataFrame(data2)

df1['new_email'] = np.where((df1[['first']] == df2[['first_small']]), df2[['email_small']], np.nan)

现在它只向new_email添加了5条记录，其余的都是nan。并告诉我这个错误：

ValueError: Can only compare identically-labeled Series objects

Answer 1

尝试merge：

(df1.merge(df2[["first_small", "last_small", "email_small"]], 
           how="left", 
           left_on=["first", "last_huge"], 
           right_on=["first_small", "last_small"])
    .drop(['first_small','last_small'], 1))

示例：

data1 = {"first":["alice", "bob", "carol"], 
         "last_huge":["foo", "bar", "baz"]}
df1 = pd.DataFrame(data1)

data2 = {"first_small":["alice", "bob", "carol"], 
         "last_small":["foo", "bar", "baz"],
         "email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"]}
df2 = pd.DataFrame(data2)

(df1.merge(df2[["first_small", "last_small", "email_small"]], 
           how="left", 
           left_on=["first", "last_huge"], 
           right_on=["first_small", "last_small"])
    .drop(['first_small','last_small'], 1))

输出：

   first last_huge    email_small
0  alice       foo  alice@xyz.com
1    bob       bar    bob@abc.com
2  carol       baz  carol@jkl.com

Answer 2

使用andrew_reece的示例数据:-) pd.concat

pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index().dropna()
Out[23]: 
   first last_huge    email_small
0  alice       foo  alice@xyz.com
1    bob       bar    bob@abc.com
2  carol       baz  carol@jkl.com

使用您的数据

pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index()
Out[97]: 
   first last_huge age_huge city_huge state_huge  street_huge zip_huge  \
0  alice       foo       23   Egviniw         MA   Jaifo Road    89899   
1    bob       bar       30  Manbaali         LA  Wetib Ridge    78788   
2  carol       baz       36  Ismazdan         CA   Ucagi View    58999   
     email_small  
0  alice@xyz.com  
1    bob@abc.com  
2  carol@jkl.com

使用map

进行了更新

df1['email_small']=(df1['first']+df1['last_huge']).map(df2.set_index(df2['first_small']+df2['last_small'])['email_small'])
df1
Out[115]: 
  age_huge city_huge  first last_huge state_huge  street_huge zip_huge  \
0       23   Egviniw  alice       foo         MA   Jaifo Road    89899   
1       30  Manbaali    bob       bar         LA  Wetib Ridge    78788   
2       36  Ismazdan  carol       baz         CA   Ucagi View    58999   
     email_small  
0  alice@xyz.com  
1    bob@abc.com  
2  carol@jkl.com

如果匹配找到从df2到df1的复制电子邮件，则比较两个不同数据帧中的列

2 个答案: