Question

问题：

我有两个数据帧（MASSIVE项目数）：

df1 = 0    1    2
      str  str  str
      str  str  str
      ...


df2 = A    B    C    D
      str  str  str  str
      str  str  str  str
      ...

我想要做的是将一列的字符串与第二个数据帧的列进行比较：

for index, row in df1.iterrows():
    if df.iloc[index][0] in df2['A'].tolist(): #I'm converting to list because it seems like it can't look into the column as an object

如果是这样，我想要做的是在匹配字符串的同一行中选择df2['B']值，并最终将其放入df1中的新列中，我会在那里类似的东西：

df1 = 0    1    2    B
      str  str  str  str
      str  str  str  nan
      str  str  str  nan
      str  str  str  str

我觉得iterrows()并不是最好的方法，但我还不够熟练，无法找到更好的解决方案。

谢谢。

Answer 1

还假设我正确理解了你的问题：

你可以使用.isin（）方法：

mask = df2['your_column'].isin(df1['your_other_column'])
df1.loc[mask,'new_column']  = df2.loc[mask,'your_column']

请注意，此方法的数据框大小必须相同....

我刚想到的其他东西是使用字典并申请。我假设您在比较列中有唯一值。

mask = df2['compare_column'].isin(df1['compare_column'])
dictionary = dict(df2[['compare_column','new_column']][mask].values)
df1['B'] = df1.apply(lambda x: dictionary[x['compare_column']], axis=1)

Answer 2

如果我理解你的问题，你应该能够做到以下几点。第一个语句计算一个临时列＆＃39; temp＆＃39;如果它能在df2 [＆＃39; A＆＃39;]中找到df1的值，则为True。如果temp为True，则第二行在df2 [＆＃39; B＆＃39;]中查找此值，否则返回np.NaN：

for col in df1.columns:
    df1['temp'] = df1[col].isin(df2['A'].unique())
    df1[col] = df1[[col,'temp']].apply(lambda x: df2['B'].get_value(df2[df2['A'] == x[col]].index[0]]) if x['temp'] else np.NaN, axis=1)

Pandas，通过iterrows将一个数据帧的列值添加到另一个数据帧

2 个答案: