数据框如何基于许多str值更新列

时间:2019-06-30 23:45:16

标签: python pandas numpy dataframe

我正在创建一个小型财务管理程序,该程序将从CSV导入我的交易到Python。我想根据在“详细信息” 列中找到的字符串将值分配给新列“类别” 。我可以做到这一点,但是我的问题是,如果我有很多可能的字符串,该怎么办?例如,str.contains('RALPHS')会将列值替换为'groceries',依此类推。

例如,下面有一个字符串列表:

dining = ['CARLS', 'SUBWAY', 'DOMINOS']

,如果在我的系列中找到了这些字符串中的任何一个,则它将把相应的类别系列更新为“正在用餐”。

下面是一个小的可运行示例。

import pandas as pd
import numpy as np

data = [
    [-68.23 , 'PAYPAL TRANSFER'],
    [-12.46, 'RALPHS #0079'],
    [-8.51, 'SAVE AS YOU GO'],
    [25.34, 'VENMO CASHOUT'],
    [-2.23 , 'PAYPAL TRANSFER'],
    [-64.29 , 'PAYPAL TRANSFER'],
    [-7.06, 'SUBWAY'],
    [-7.03, 'CARLS JR'],
    [-2.35, 'SHELL OIL'],
    [-35.23, 'CHEVRON GAS']
]

df = pd.DataFrame(data, columns=['amount', 'details'])
df['category'] = np.nan
str_xfer = 'TRANSFER'
df['category'] = (df['details'].str.contains(str_xfer)).astype(int)
df['category'] = df['category'].replace(
                                                            to_replace=1,
                                                            value='transfer')

df

    amount  details             category
0   -68.23  PAYPAL TRANSFER     transfer
1   -12.46  RALPHS              0
2   -8.51   SAVE AS YOU GO      0
3   25.34   VENMO CASHOUT       0
4   -2.23   PAYPAL TRANSFER     transfer
5   -64.29  PAYPAL TRANSFER     transfer
6   -7.06   SUBWAY              0
7   -7.03   CARLS JR            0
8   -2.35   SHELL OIL           0
9   -35.23  CHEVRON GAS         0

非常感谢。

2 个答案:

答案 0 :(得分:4)

如果您有一个值,我们可以使用str.extract

[-100.]
[-100.]
[-98.99]
[-94.95]
[-78.79]
[-30.17904355]
[-3.55271368e-15]
df['category'] = df['details'].str.extract(f'({str_xfer})')

如果您要匹配多个字符串,我们必须先用 amount details category 0 -68.23 PAYPAL TRANSFER TRANSFER 1 -12.46 RALPHS #0079 NaN 2 -8.51 SAVE AS YOU GO NaN 3 25.34 VENMO CASHOUT NaN 4 -2.23 PAYPAL TRANSFER TRANSFER 5 -64.29 PAYPAL TRANSFER TRANSFER 来分隔字符串,|是正则表达式中的运算符。

str_xfer = ['TRANSFER', 'RALPHS', 'CASHOUT']
str_xfer = '|'.join(str_xfer)

df['category'] = df['details'].str.extract(f'({str_xfer})')
   amount          details  category
0  -68.23  PAYPAL TRANSFER  TRANSFER
1  -12.46     RALPHS #0079    RALPHS
2   -8.51   SAVE AS YOU GO       NaN
3   25.34    VENMO CASHOUT   CASHOUT
4   -2.23  PAYPAL TRANSFER  TRANSFER
5  -64.29  PAYPAL TRANSFER  TRANSFER

答案 1 :(得分:1)

我认为您需要str.findall

df['category']=df.details.str.findall('TRANSFER').str[0].fillna(0)
df
   amount          details  category
0  -68.23  PAYPAL TRANSFER  TRANSFER
1  -12.46     RALPHS #0079         0
2   -8.51   SAVE AS YOU GO         0
3   25.34    VENMO CASHOUT         0
4   -2.23  PAYPAL TRANSFER  TRANSFER
5  -64.29  PAYPAL TRANSFER  TRANSFER

如果您在str_xfer中添加多个'|'的字符串,则

df.details.str.findall('TRANSFER|VENMO').str[0]
0    TRANSFER
1         NaN
2         NaN
3       VENMO
4    TRANSFER
5    TRANSFER
Name: details, dtype: object