Question

我的df数据有两列，如

thePerson  theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"

我希望结果df为

thePerson  theText
"the abc" "this is about <b>the abc</b>"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about <b>WXY</b>"

请注意，如果同一行中的文本包含thePerson，则它会在文本中变为粗体。

我尝试失败的解决方案之一是：

df['theText']=df['theText'].replace(df.thePerson,'<b>'+df.thePerson+'</b>', regex=True)

我想知道我是否可以使用lapply或map

执行此操作

我的python环境设置为2.7版

Answer 1

使用re.sub和zip

tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
    theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
             for t, p in zip(tt, tp)]
)

  thePerson                       theText
0   the abc  this is about <b>the abc</b>
1       xyz             this is about tyu
2       wxy             this is about abc
3       wxy      this is about <b>WXY</b>

<强> 复制/粘贴
您应该能够运行此确切代码并获得所需的结果

from io import StringIO
import pandas as pd

txt = '''thePerson  theText
"the abc"  "this is about the abc"
"xyz"  "this is about tyu"
"wxy"  "this is about abc"
"wxy"  "this is about WXY"'''

df = pd.read_csv(StringIO(txt), sep='\s{2,}', engine='python')

tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
    theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
             for t, p in zip(tt, tp)]
)

你应该看到这个

   thePerson                         theText
0  "the abc"  "this is about <b>the abc</b>"
1      "xyz"             "this is about tyu"
2      "wxy"             "this is about abc"
3      "wxy"      "this is about <b>WXY</b>"

Answer 2

您可以使用apply：

df['theText'] = df.apply(lambda x: re.sub(r'('+x.thePerson+')',
                                          r'<b>\1</b>', 
                                          x.theText, 
                                          flags=re.IGNORECASE), axis=1)
print (df)
  thePerson                       theText
0   the abc  this is about <b>the abc</b>
1       xyz             this is about tyu
2       wxy             this is about abc
3       wxy      this is about <b>WXY</b>

如果有来自另一列的项匹配，如何迭代Pandas DataFrame并替换字符串

2 个答案: