我有按名称分类的列表,例如:
dining = ['CARLS', 'SUBWAY', 'PIZZA']
bank = ['TRANSFER', 'VENMO', 'SAVE AS YOU GO']
,如果在另一列中找到任何这些字符串,我想将新列更新为类别名称。来自另一个问题here的示例,我具有以下数据集(示例银行交易清单):
import pandas as pd
import numpy as np
dining = ['CARLS', 'SUBWAY', 'PIZZA']
bank = ['TRANSFER', 'VENMO', 'SAVE AS YOU GO']
data = [
[-68.23 , 'PAYPAL TRANSFER'],
[-12.46, 'RALPHS #0079'],
[-8.51, 'SAVE AS YOU GO'],
[25.34, 'VENMO CASHOUT'],
[-2.23 , 'PAYPAL TRANSFER'],
[-64.29 , 'PAYPAL TRANSFER'],
[-7.06, 'SUBWAY'],
[-7.03, 'CARLS JR'],
[-2.35, 'SHELL OIL'],
[-35.23, 'CHEVRON GAS']
]
df = pd.DataFrame(data, columns=['amount', 'details'])
df['category'] = np.nan
df
amount details category
0 -68.23 PAYPAL TRANSFER NaN
1 -12.46 RALPHS #0079 NaN
2 -8.51 SAVE AS YOU GO NaN
3 25.34 VENMO CASHOUT NaN
4 -2.23 PAYPAL TRANSFER NaN
5 -64.29 PAYPAL TRANSFER NaN
6 -7.06 SUBWAY NaN
7 -7.03 CARLS JR NaN
8 -2.35 SHELL OIL NaN
9 -35.23 CHEVRON GAS NaN
根据列表中的字符串是否在data.details中找到,我是否有一种有效的方法将category列更新为“ dining”或“ bank”?
I.e. Desired Output:
amount details category
0 -68.23 PAYPAL TRANSFER bank
1 -12.46 RALPHS #0079 NaN
2 -8.51 SAVE AS YOU GO bank
3 25.34 VENMO CASHOUT bank
4 -2.23 PAYPAL TRANSFER bank
5 -64.29 PAYPAL TRANSFER bank
6 -7.06 SUBWAY dining
7 -7.03 CARLS JR dining
8 -2.35 SHELL OIL NaN
9 -35.23 CHEVRON GAS NaN
到目前为止,从我之前的问题开始,我假设我需要使用通过str.extract创建的新列表。
答案 0 :(得分:3)
我们可以使用np.select
进行此操作,因为我们有多个条件:
dining = '|'.join(dining)
bank = '|'.join(bank)
conditions = [
df['details'].str.contains(f'({dining})'),
df['details'].str.contains(f'({bank})')
]
choices = ['dining', 'bank']
df['category'] = np.select(conditions, choices, default=np.NaN)
amount details category
0 -68.23 PAYPAL TRANSFER bank
1 -12.46 RALPHS #0079 nan
2 -8.51 SAVE AS YOU GO bank
3 25.34 VENMO CASHOUT bank
4 -2.23 PAYPAL TRANSFER bank
5 -64.29 PAYPAL TRANSFER bank
6 -7.06 SUBWAY dining
7 -7.03 CARLS JR dining
8 -2.35 SHELL OIL nan
9 -35.23 CHEVRON GAS nan
答案 1 :(得分:2)
您可以使用findall
+ dict
map
sub = {**dict.fromkeys(dining, 'dining'), **dict.fromkeys(bank, 'bank')}
df.details.str.findall('|'.join(sub)).str[0].map(sub)
Out[146]:
0 bank
1 NaN
2 bank
3 bank
4 bank
5 bank
6 dining
7 dining
8 NaN
9 NaN
Name: details, dtype: object
#df['category'] = df.details.str.findall('|'.join(sub)).str[0].map(sub)