Question

我有2个数据框，我想根据包含的字符串在特定列上合并。似乎是以下问题，但顺序不同：How to merge pandas on string contains?

import pandas as pd

df1 = pd.DataFrame({'Amount':[10, 20, 30], 'Description':['this is a text','this is another text','this is an important']})
df2 = pd.DataFrame({'Text':['another','important'], 'Category':['Another Category','Important Category']})

rhs = (df1.Description
          .apply(lambda x: df2[df2['Category']] if df2[df2['Text']] in str(x).lower() else None)
      )

(pd.concat([df1.Amount, rhs], axis=1, ignore_index=True)
 .rename(columns={0: 'Amount', 1: 'Category'}))

我收到以下错误消息：

KeyError: "None of [Index(['another', 'important'], dtype='object')] are in the [columns]"

由于lambda表达式而发生。使用df2 [df2 ['Text']]部分，我尝试遍历包含类别的数据框，但这不起作用。

Answer 1

假设df2是文本及其类别的唯一表，我想这可以工作。（假设dfs与您发布的一样）

join_map = {row['Text']:row['Category'] for ind,row in df2.iterrows()}

df1['Category'] = df1['Description'].apply(lambda x: [val for key,val in join_map.items() if key in x][0] if [val for key,val in join_map.items() if key in x] else None)

如何合并基于字符串包含的两个数据框？

1 个答案: