我正在尝试为网页中的每个单词添加href,然后使用添加的href再次保存。为此,我使用的是BeautifulSoup,并且此代码运行正常:
wordToSearch = "war"
for text in soup2.find_all(text=True):
if re.search(r'(\w*)%s\b' %wordToSearch, text):
text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' % wordToSearch, r'<a href="http://example.com/%s">%s</a>' %(wordToSearch, wordToSearch), text, re.UNICODE), 'html.parser'))
然后我用以下代码编写新文件:
with open("output1.html", "w") as file:
file.write(str(soup))
仅当我需要在单个特定单词上添加href时,此方法才能正常工作,但是如果我想为单词列表添加href,我不知道该怎么做:
listOfWords = ["war", "love"]
for text in soup2.find_all(text=True):
for a in listOfWords:
if re.search(r'(\w*)%s\b' %a, text):
text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' %a, r'<a href="https://it.wiktionary.org/wiki/%s">%s</a>' %(a, a), text, re.UNICODE), 'html.parser'))
这是我运行它时得到的:
Traceback (most recent call last):
File "./test.py", line 110, in <module>
text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' % wordToSearch, r'<a href="http://example.com/%s">%s</a>' %(wordToSearch, wordToSearch), text, re.UNICODE), 'html.parser'))
File "/Library/Python/2.7/site-packages/bs4/element.py", line 235, in replace_with
"Cannot replace one element with another when the"
ValueError: Cannot replace one element with another when theelement to be replaced is not part of a tree
答案 0 :(得分:0)
最简单的方法是在每次通过后重建汤。
from bs4 import BeautifulSoup
import re
html = """
<html>
<p>love blah blah war blah blah love blah blah war</p>
<p>love blah blah blah blah love blah blah </p>
<p>blah blah love blah blah war blah blah love blah blah war blah blah</p>
</html>
"""
listOfWords = ["war", "love"]
for a in listOfWords:
soup = BeautifulSoup(html, 'html.parser')
for text in soup.find_all(text=True):
if re.search(r'(\w*)%s\b' %a, text):
text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' %a, r'<a href="https://it.wiktionary.org/wiki/%s">%s</a>' %(a, a), text, re.UNICODE), 'html.parser'))
html = str(soup)
print (soup)
输出:
<html>
<p><a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a> blah blah <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a></p>
<p><a href="https://it.wiktionary.org/wiki/love">love</a> blah blah blah blah <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah </p>
<p>blah blah <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a> blah blah <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a> blah blah</p>
</html>