我必须根据另一列的子字符串为Pandas DataFrame的“ group”列分配值。示例数据框:
import pandas as pd
groups = ['custumer', 'supplier', 'irrelevant', 'spam', 'invoice', 'shipping advice']
df = pd.DataFrame({
'mailLabels': ['customers/AcmeBar', 'suppliers/AcmeBaz', 'irrelevant', 'spam', 'invoice', 'shipping advice' ],
'group': ['na', 'na', 'na', 'na', 'na', 'na']})
我的解决方案有效,但是它非常麻烦,因为组数比本示例中的要大得多:
df['group'] = pd.np.where(df.mailLabels.str.contains("customer"), "sales",
pd.np.where(df.mailLabels.str.contains("supplier"), "procurement",
pd.np.where(df.mailLabels.str.contains("irrelevant"), "not important",
pd.np.where(df.mailLabels.str.contains("spam"), "not important", "other"))))
print(df)
mailLabels group
0 customers/AcmeBar sales
1 suppliers/AcmeBaz procurement
2 irrelevant not important
3 spam not important
4 invoice other
5 shipping advice other
是否存在针对此问题的矢量化解决方案? This one不起作用,因为由于数据混乱,我无法拆分mailLabels列。