我的df
数据有两列,如
thePerson theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"
我希望结果df
为
thePerson theText
"the abc" "this is about <b>the abc</b>"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about <b>WXY</b>"
请注意,如果同一行中的文本包含thePerson,则它会在文本中变为粗体。
我尝试失败的解决方案之一是:
df['theText']=df['theText'].replace(df.thePerson,'<b>'+df.thePerson+'</b>', regex=True)
我想知道我是否可以使用lapply
或map
我的python环境设置为2.7版
答案 0 :(得分:2)
使用re.sub
和zip
tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
for t, p in zip(tt, tp)]
)
thePerson theText
0 the abc this is about <b>the abc</b>
1 xyz this is about tyu
2 wxy this is about abc
3 wxy this is about <b>WXY</b>
<强> 复制/粘贴 强>
您应该能够运行此确切代码并获得所需的结果
from io import StringIO
import pandas as pd
txt = '''thePerson theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"'''
df = pd.read_csv(StringIO(txt), sep='\s{2,}', engine='python')
tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
for t, p in zip(tt, tp)]
)
你应该看到这个
thePerson theText
0 "the abc" "this is about <b>the abc</b>"
1 "xyz" "this is about tyu"
2 "wxy" "this is about abc"
3 "wxy" "this is about <b>WXY</b>"
答案 1 :(得分:1)
您可以使用apply
:
df['theText'] = df.apply(lambda x: re.sub(r'('+x.thePerson+')',
r'<b>\1</b>',
x.theText,
flags=re.IGNORECASE), axis=1)
print (df)
thePerson theText
0 the abc this is about <b>the abc</b>
1 xyz this is about tyu
2 wxy this is about abc
3 wxy this is about <b>WXY</b>