重命名类别

时间:2018-04-09 09:59:02

标签: python string pandas dataframe text-classification

    text       category
----------------------------------------------- 
    nike    shoes from nike brought by ankit
    flour   grocery
    rice    grocery
    adidas  shoes from adidas are cool

以上是我的数据集格式。在分类时如何对类别进行概括。 示例我希望输出为: -

text       category
----------------------------------------------- 
    nike    shoes from brand
    flour   grocery
    rice    grocery
    adidas  shoes from brand

1 个答案:

答案 0 :(得分:2)

一种方法是使用pd.DataFrame.apply的自定义函数:

import pandas as pd

df = pd.DataFrame({'text': ['nike', 'flour', 'rice', 'adidas'],
                   'category': ['shoes from nike bought by ankit', 'grocery', 'grocery',
                                'shoes from adidas are cool']})

def converter(row):
    if row['text'] in row['category']:
        return row['category'].split(' from ')[0] + ' from brand'
    else:
        return row['category']

df['category'] = df.apply(converter, axis=1)

#            category    text
# 0  shoes from brand    nike
# 1           grocery   flour
# 2           grocery    rice
# 3  shoes from brand  adidas