数据框根据该类别中的字符串值列表将列更新为类别名称

时间:2019-07-01 19:10:35

标签: python pandas numpy dataframe

我有按名称分类的列表,例如:

dining = ['CARLS', 'SUBWAY', 'PIZZA']
bank = ['TRANSFER', 'VENMO', 'SAVE AS YOU GO']

,如果在另一列中找到任何这些字符串,我想将新列更新为类别名称。来自另一个问题here的示例,我具有以下数据集(示例银行交易清单):

import pandas as pd
import numpy as np

dining = ['CARLS', 'SUBWAY', 'PIZZA']
bank = ['TRANSFER', 'VENMO', 'SAVE AS YOU GO']

data = [
    [-68.23 , 'PAYPAL TRANSFER'],
    [-12.46, 'RALPHS #0079'],
    [-8.51, 'SAVE AS YOU GO'],
    [25.34, 'VENMO CASHOUT'],
    [-2.23 , 'PAYPAL TRANSFER'],
    [-64.29 , 'PAYPAL TRANSFER'],
    [-7.06, 'SUBWAY'],
    [-7.03, 'CARLS JR'],
    [-2.35, 'SHELL OIL'],
    [-35.23, 'CHEVRON GAS']
]

df = pd.DataFrame(data, columns=['amount', 'details'])
df['category'] = np.nan
df

    amount  details             category
0   -68.23  PAYPAL TRANSFER     NaN
1   -12.46  RALPHS #0079        NaN
2   -8.51   SAVE AS YOU GO      NaN
3   25.34   VENMO CASHOUT       NaN
4   -2.23   PAYPAL TRANSFER     NaN
5   -64.29  PAYPAL TRANSFER     NaN
6   -7.06   SUBWAY              NaN
7   -7.03   CARLS JR            NaN
8   -2.35   SHELL OIL           NaN
9   -35.23  CHEVRON GAS         NaN

根据列表中的字符串是否在data.details中找到,我是否有一种有效的方法将category列更新为“ dining”或“ bank”?

I.e. Desired Output:
    amount  details             category
0   -68.23  PAYPAL TRANSFER     bank
1   -12.46  RALPHS #0079        NaN
2   -8.51   SAVE AS YOU GO      bank
3   25.34   VENMO CASHOUT       bank
4   -2.23   PAYPAL TRANSFER     bank
5   -64.29  PAYPAL TRANSFER     bank
6   -7.06   SUBWAY              dining
7   -7.03   CARLS JR            dining
8   -2.35   SHELL OIL           NaN
9   -35.23  CHEVRON GAS         NaN

到目前为止,从我之前的问题开始,我假设我需要使用通过str.extract创建的新列表。

2 个答案:

答案 0 :(得分:3)

我们可以使用np.select进行此操作,因为我们有多个条件:

dining = '|'.join(dining)
bank = '|'.join(bank)

conditions = [
    df['details'].str.contains(f'({dining})'),
    df['details'].str.contains(f'({bank})')
]

choices = ['dining', 'bank']

df['category'] = np.select(conditions, choices, default=np.NaN)

   amount          details category
0  -68.23  PAYPAL TRANSFER     bank
1  -12.46     RALPHS #0079      nan
2   -8.51   SAVE AS YOU GO     bank
3   25.34    VENMO CASHOUT     bank
4   -2.23  PAYPAL TRANSFER     bank
5  -64.29  PAYPAL TRANSFER     bank
6   -7.06           SUBWAY   dining
7   -7.03         CARLS JR   dining
8   -2.35        SHELL OIL      nan
9  -35.23      CHEVRON GAS      nan

答案 1 :(得分:2)

您可以使用findall + dict map

sub = {**dict.fromkeys(dining, 'dining'), **dict.fromkeys(bank, 'bank')}
df.details.str.findall('|'.join(sub)).str[0].map(sub)
Out[146]: 
0      bank
1       NaN
2      bank
3      bank
4      bank
5      bank
6    dining
7    dining
8       NaN
9       NaN
Name: details, dtype: object

#df['category'] = df.details.str.findall('|'.join(sub)).str[0].map(sub)