我有一个数据框
up-sells.php
第二个数据帧
#Around 100000 rows
df = pd.DataFrame({'text': [ 'Apple is healthy', 'Potato is round', 'Apple might be green'],
'category': ["","", ""],
})
所需结果
#Around 3000 rows
df_2 = pd.DataFrame({'keyword': [ 'Apple ', 'Potato'],
'category': ["fruit","vegetable"],
})
我目前正在尝试
#Around 100000 rows
df = pd.DataFrame({'text': [ 'Apple is healthy', 'Potato is round', 'Apple might be green'],
'category': ["fruit","vegetable", "fruit"],
})
结果是
df.set_index('text')
df_2.set_index('keyword')
df.update(df_2)
您会看到它没有为最后一行添加类别。我该如何实现?
答案 0 :(得分:0)
您需要分配DataFrame.set_index
的输出,因为没有DataFrame.update
这样的就地操作,df_2["keyword"]
列使用Series.str.extract
进行匹配:
df = df.set_index(df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False))
df_2 = df_2.set_index('keyword')
print (df)
text category
text
Apple Apple is healthy
Potato Potato is round
Apple Apple might be green
df.update(df_2)
print (df)
text category
text
Apple Apple is healthy fruit
Potato Potato is round vegetable
Apple Apple might be green fruit
如果只需要添加一列,请使用Series.str.extract
和Series.map
:
s = df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False)
df['category'] = s.map(df_2.set_index(['keyword'])['category'])
print (df)
text category
0 Apple is healthy fruit
1 Potato is round vegetable
2 Apple might be green fruit