我正在尝试寻找在数据框的任何列中出现“ gas”,“柴油”或“ ev”字样的实例(不区分大小写)。如果在这些列中找到这些词的任何版本,我想在标题为“ FUEL”的新列中输入燃料类型的缩写。
excerpt of my dataframe
SUMN SOUN MATN
Light duty vehicle Diesel Tire wear Rubber
Heavy duty diesel Non-catalyst Diesel
Light duty truck catalyst Gasoline
Medium duty vehicle EV brake wear brakes
What I'm hoping to output
SUMN SOUN MATN FUEL
Light duty vehicle Diesel Tire wear Rubber DSL
Heavy duty diesel Non-catalyst Diesel DSL
Light duty truck catalyst Gasoline GAS
Medium duty vehicle EV brake wear brakes ELEC
我如何做到这一点?
我已经开始能够查看一种类型的字符串的一列,但是对如何超越这一点感到困惑。
df['FUEL'] = df['SUMN'].str.contains('diesel', case=False)
答案 0 :(得分:3)
这是将apply
与str.contains
结合使用的一种方法,用于检查每个单词的所有列。最后,我们将单词映射到正确的单词,例如ev -> ELECT
。
请注意,我在正则表达式中使用了?i
,这使它不区分大小写:
words = ['gas', 'diesel', 'ev']
mapping = {'gas':'GAS', 'diesel':'DSL', 'ev':'ELEC'}
for word in words:
m = df.apply(lambda x: x.str.contains(f'(?i)({word})')).any(axis=1)
df.loc[m, 'FUEL'] = mapping[word]
输出
SUMN SOUN MATN FUEL
0 Light duty vehicle Diesel Tire wear Rubber DSL
1 Heavy duty diesel Non-catalyst Diesel DSL
2 Light duty truck catalyst Gasoline GAS
3 Medium duty vehicle EV brake wear brakes ELEC
答案 1 :(得分:1)
肯定有一个更优化的解决方案,但是希望它能使您走上正确的道路……基本上遍历每一行,遍历各列和潜在的燃料字符串,并确定要使用的缩写:
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]
del df['all']
输出:
SUMN SOUN MATN FUEL
0 Light duty vehicle Diesel Tire wear Rubber DSL
1 Heavy duty diesel Non-catalyst Diesel DSL
2 Light duty truck catalyst Gasoline GAS
3 Medium duty vehicle EV brake wear brakes ELEC
假设每行中仅出现一种燃料类型
编辑:受其他解决方案启发:
import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)
相同的输出:)