我想在数据框中添加一个new_column,其中指示符为'LH'或'RH'。我已经在Additional_info列上尝试了以下代码。
LH = ['lhd','lh','lhd','left','le']
RH = [ 'rhd','rh','rhd','right,'re']
lh_rh= match_id[['MATA_info','tech_info','additional_info']]
lh_rh['additional_info']= lh_rh['additional_info'].str.lower()
Right = lh_rh.loc[lh_rh['additional_info'].isin(RH)]
left = lh_rh.loc[lh_rh['additional_info'].isin(LH)]
如果LH和RH中的关键字与“ MATA_info”,“ tech_info”和“ additional_info”列匹配,则我想进行匹配,则应创建名称为“ Relation”的其他列并将其分配给LH或相对湿度。
MATA_info tech_info additional_info Relation
3,50X085Right F85 NAN RH
3,50X085Left F85 lh LH
答案 0 :(得分:1)
将str.contains
与apply
结合使用,然后检查是否DataFrame.any
每行至少有一个True
,最后将其传递给numpy.select
:
LH = ['lhd','lh','lhd','left','le']
RH = [ 'rhd','rh','rhd','right','re']
lh_rh= match_id[['MATA_info','tech_info','additional_info']]
m1 = lh_rh.apply(lambda x: x.str.contains('|'.join(LH), na=False, case=False)).any(axis=1)
m2 = lh_rh.apply(lambda x: x.str.contains('|'.join(RH), na=False, case=False)).any(axis=1)
match_id['Relation'] = np.select([m1, m2], ['LH','RH'], default=np.nan)
print (match_id)
MATA_info tech_info additional_info Relation
0 3,50X085Right F85 NAN RH
1 3,50X085Left F85 lh LH
2 4,56 %T jj nan
编辑:
pat1 = '|'.join(r"\b{}\b".format(x) for x in LH)
pat2 = '|'.join(r"\b{}\b".format(x) for x in RH)
m1 = lh_rh.apply(lambda x: x.str.contains(pat1, na=False, case=False)).any(axis=1)
m2 = lh_rh.apply(lambda x: x.str.contains(pat2, na=False, case=False)).any(axis=1)