让我们考虑以下HTML代码段:
html = '''
<p>
The chairman of European Union leaders, Donald Tusk, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
</p>
'''
让它变成一个BeautifulSoup对象:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
我想转换该汤对象,使其HTML输出为:
'''
<p>
The chairman of European Union leaders, <span style="color : red"> Donald Tusk </span>, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
</p>
'''
我在the doc page of BeautifulSoup上找到了几个示例,这些示例如何替换字符串,创建新标签,甚至在树中的特定位置插入新标签,但是在我的用例中在字符串中间添加新标签。
任何帮助都非常欢迎。
答案 0 :(得分:1)
首先,我要说谢谢您发布这个问题,因为这是一个非常有趣的编码问题。
我花了一些时间研究这个问题,最终决定给出答案。
我尝试使用insert_before()
中的insert_after()
和BeautifulSoup
来修改示例HTML中的<p>
标签。我还研究了使用extend()
中的append()
和BeautifulSoup
。经过数十次尝试,我只是无法获得您要求的结果。
以下代码似乎可以根据关键字(例如Donald Tusk)完成请求的HTML修改。我使用了replace_with()
BeautifulSoup
中的内容,将HTML中的原始标记替换为new_tag()
中的BeautifulSoup.
该代码有效,但是我敢肯定它可以改进。
from bs4 import BeautifulSoup
raw_html = """
<p> This is a test. </p>
<p>The chairman of European Union leaders, Donald Tusk, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.</p>
<p> This is also a test. </p>
"""
soup = BeautifulSoup(raw_html, 'lxml')
# find the tag that contains the keyword Donald Tusk
original_tag = soup.find('p',text=re.compile(r'Donald Tusk'))
if original_tag:
# modify text in the tag that was found in the HTML
tag_to_modify = str(original_tag.get_text()).replace('Donald Tusk,', '<span style="color:red">Donald Tusk</span>,')
print (tag_to_modify)
# outputs
The chairman of European Union leaders, <span style="color:red">Donald Tusk</span>, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
# create a new <p> tag in the soup
new_tag = soup.new_tag('p')
# add the modified text to the new tag
# setting a tag’s .string attribute replaces the contents with the new string
new_tag.string = tag_to_modify
# replace the original tag with the new tag
old_tag = original_tag.replace_with(new_tag)
# formatter=None, BeautifulSoup will not modify strings on output
# without this the angle brackets will get turned into “<”, and “>”
print (soup.prettify(formatter=None))
# outputs
<html>
<body>
<p>
This is a test.
</p>
<p>
The chairman of European Union leaders, <span style="color:red">Donald Tusk</span>, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
</p>
<p>
This is also a test.
</p>
</body>
</html>
答案 1 :(得分:0)
尝试使用循环,遍历字符串中的每个单词,找到要查找的字符串(使用任何可行的方法,正则表达式将很有用),然后使用 Tag.insert(position,“ found_word “)
答案 2 :(得分:0)
您需要使用正则表达式。希望这段代码对您有所帮助。
import re
def highlight_matches(query, text):
def span_matches(match):
html = '<span style="color : red">{0}</span>'
return html.format(match.group(0))
return re.sub(query, span_matches, text, flags=re.I)