搜索列中的一组特定文本,如果找到了该文本,则在新列pandas中输入新的文本字符串

时间:2019-10-30 22:29:49

标签: python python-3.x pandas

我正在尝试寻找在数据框的任何列中出现“ gas”,“柴油”或“ ev”字样的实例(不区分大小写)。如果在这些列中找到这些词的任何版本,我想在标题为“ FUEL”的新列中输入燃料类型的缩写。

excerpt of my dataframe

SUMN                 SOUN               MATN   
Light duty vehicle   Diesel Tire wear   Rubber
Heavy duty diesel    Non-catalyst       Diesel
Light duty truck     catalyst           Gasoline
Medium duty vehicle  EV brake wear      brakes

What I'm hoping to output
SUMN                 SOUN               MATN      FUEL
Light duty vehicle   Diesel Tire wear   Rubber    DSL
Heavy duty diesel    Non-catalyst       Diesel    DSL
Light duty truck     catalyst           Gasoline  GAS
Medium duty vehicle  EV brake wear      brakes    ELEC

我如何做到这一点?

我已经开始能够查看一种类型的字符串的一列,但是对如何超越这一点感到困惑。

df['FUEL'] = df['SUMN'].str.contains('diesel', case=False)

2 个答案:

答案 0 :(得分:3)

这是将applystr.contains结合使用的一种方法,用于检查每个单词的所有列。最后,我们将单词映射到正确的单词,例如ev -> ELECT

请注意,我在正则表达式中使用了?i,这使它区分大小写:

words = ['gas', 'diesel', 'ev']
mapping = {'gas':'GAS', 'diesel':'DSL', 'ev':'ELEC'}

for word in words:
    m = df.apply(lambda x: x.str.contains(f'(?i)({word})')).any(axis=1)
    df.loc[m, 'FUEL'] = mapping[word]

输出

                  SUMN              SOUN      MATN  FUEL
0   Light duty vehicle  Diesel Tire wear    Rubber   DSL
1    Heavy duty diesel      Non-catalyst    Diesel   DSL
2     Light duty truck          catalyst  Gasoline   GAS
3  Medium duty vehicle     EV brake wear    brakes  ELEC

答案 1 :(得分:1)

肯定有一个更优化的解决方案,但是希望它能使您走上正确的道路……基本上遍历每一行,遍历各列和潜在的燃料字符串,并确定要使用的缩写:

d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
    df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]

del df['all']

输出:

                  SUMN              SOUN      MATN  FUEL
0   Light duty vehicle  Diesel Tire wear    Rubber   DSL
1    Heavy duty diesel      Non-catalyst    Diesel   DSL
2     Light duty truck          catalyst  Gasoline   GAS
3  Medium duty vehicle     EV brake wear    brakes  ELEC

假设每行中仅出现一种燃料类型

编辑:受其他解决方案启发:

import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)

相同的输出:)