在数据框列中搜索单词

时间:2020-02-02 22:02:48

标签: python pandas dataframe

我有以下数据资料:

       import pandas as pd
       df = pd.DataFrame({'Id_email': [1, 2, 3, 4], 
                          'Word': ['_ SENSOR 12', 'new_SEN041', 'engine', 'sens 12'],
                          'Date': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})

     print(df)

我想在“单词”列中滚动查找“传感器”一词的衍生词。

如果找到了,我想用Sensor_Type填充新的“类型”列,如果找不到,则在相应的行中用“其他”填充。

我尝试如下实现(此代码错误):

      df['Type'] = 'Other'

      for i in range(0, len(df)):

         if(re.search('\\SEN\\b', df['Word'].iloc[i], re.IGNORECASE) or
            re.search('\\sen\\b', df['Word'].iloc[i], re.IGNORECASE)):

                    df['Type'].iloc[i] == 'Sensor_Type'
        else:
                   df['Type'].iloc[i] == 'Other'

我的(错误的)输出如下:

Id_email        Word         Date_end   Type
     1      _ SENSOR 12     2018-01-05  Other
     2       new_SEN041     2018-01-06  Other
     3         engine       2017-01-06  Other
     4         sens 12      2018-01-05  Other

但是,我希望输出如下:

Id_email        Word         Date_end   Type
     1      _ SENSOR 12     2018-01-05  Sensor_Type
     2       new_SEN041     2018-01-06  Sensor_Type
     3            engine    2017-01-06  Other
     4         sens 12      2018-01-05  Sensor_Type

2 个答案:

答案 0 :(得分:3)

使用pandas str包含并包含大小写作为False-这使您可以搜索sen或SEN

df.assign(Type = lambda x: np.where(x.Word.str.contains(r'SEN', case=False), 
                                    'Sensor_Type','Other'))

    Id_email    Word    Date    Type
0   1   _ SENSOR 12 2018-01-05  Sensor_Type
1   2   new_SEN041  2018-01-06  Sensor_Type
2   3   engine  2017-01-06  Other
3   4   sens 12 2018-01-05  Sensor_Type

答案 1 :(得分:1)

df['Type'] = df.apply(lambda x: 'Sensor_Type' if re.search(r'SEN|sen',x['Word']) else 'Other', axis=1)