我有以下数据资料:
import pandas as pd
df = pd.DataFrame({'Id_email': [1, 2, 3, 4],
'Word': ['_ SENSOR 12', 'new_SEN041', 'engine', 'sens 12'],
'Date': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
print(df)
我想在“单词”列中滚动查找“传感器”一词的衍生词。
如果找到了,我想用Sensor_Type填充新的“类型”列,如果找不到,则在相应的行中用“其他”填充。
我尝试如下实现(此代码错误):
df['Type'] = 'Other'
for i in range(0, len(df)):
if(re.search('\\SEN\\b', df['Word'].iloc[i], re.IGNORECASE) or
re.search('\\sen\\b', df['Word'].iloc[i], re.IGNORECASE)):
df['Type'].iloc[i] == 'Sensor_Type'
else:
df['Type'].iloc[i] == 'Other'
我的(错误的)输出如下:
Id_email Word Date_end Type
1 _ SENSOR 12 2018-01-05 Other
2 new_SEN041 2018-01-06 Other
3 engine 2017-01-06 Other
4 sens 12 2018-01-05 Other
但是,我希望输出如下:
Id_email Word Date_end Type
1 _ SENSOR 12 2018-01-05 Sensor_Type
2 new_SEN041 2018-01-06 Sensor_Type
3 engine 2017-01-06 Other
4 sens 12 2018-01-05 Sensor_Type
答案 0 :(得分:3)
使用pandas str包含并包含大小写作为False-这使您可以搜索sen或SEN
df.assign(Type = lambda x: np.where(x.Word.str.contains(r'SEN', case=False),
'Sensor_Type','Other'))
Id_email Word Date Type
0 1 _ SENSOR 12 2018-01-05 Sensor_Type
1 2 new_SEN041 2018-01-06 Sensor_Type
2 3 engine 2017-01-06 Other
3 4 sens 12 2018-01-05 Sensor_Type
答案 1 :(得分:1)
df['Type'] = df.apply(lambda x: 'Sensor_Type' if re.search(r'SEN|sen',x['Word']) else 'Other', axis=1)