根据另一列的子字符串值创建Pandas DataFrame列

时间:2019-12-30 18:56:34

标签: python pandas dataframe

我必须根据另一列的子字符串为Pandas DataFrame的“ group”列分配值。示例数据框:

import pandas as pd

groups = ['custumer', 'supplier', 'irrelevant', 'spam', 'invoice', 'shipping advice']

df = pd.DataFrame({
    'mailLabels': ['customers/AcmeBar', 'suppliers/AcmeBaz', 'irrelevant', 'spam', 'invoice', 'shipping advice' ],
    'group': ['na', 'na', 'na', 'na', 'na', 'na']})

我的解决方案有效,但是它非常麻烦,因为组数比本示例中的要大得多:

df['group'] = pd.np.where(df.mailLabels.str.contains("customer"), "sales",
                               pd.np.where(df.mailLabels.str.contains("supplier"), "procurement",
                               pd.np.where(df.mailLabels.str.contains("irrelevant"), "not important",
                               pd.np.where(df.mailLabels.str.contains("spam"), "not important", "other"))))

print(df)

          mailLabels          group
0  customers/AcmeBar          sales
1  suppliers/AcmeBaz    procurement
2         irrelevant  not important
3               spam  not important
4            invoice          other
5    shipping advice          other

是否存在针对此问题的矢量化解决方案? This one不起作用,因为由于数据混乱,我无法拆分mailLabels列。

0 个答案:

没有答案