text category
-----------------------------------------------
nike shoes from nike brought by ankit
flour grocery
rice grocery
adidas shoes from adidas are cool
以上是我的数据集格式。在分类时如何对类别进行概括。 示例我希望输出为: -
text category
-----------------------------------------------
nike shoes from brand
flour grocery
rice grocery
adidas shoes from brand
答案 0 :(得分:2)
一种方法是使用pd.DataFrame.apply
的自定义函数:
import pandas as pd
df = pd.DataFrame({'text': ['nike', 'flour', 'rice', 'adidas'],
'category': ['shoes from nike bought by ankit', 'grocery', 'grocery',
'shoes from adidas are cool']})
def converter(row):
if row['text'] in row['category']:
return row['category'].split(' from ')[0] + ' from brand'
else:
return row['category']
df['category'] = df.apply(converter, axis=1)
# category text
# 0 shoes from brand nike
# 1 grocery flour
# 2 grocery rice
# 3 shoes from brand adidas