Question

我正在尝试为网页中的每个单词添加href，然后使用添加的href再次保存。为此，我使用的是BeautifulSoup，并且此代码运行正常：

wordToSearch = "war"
for text in soup2.find_all(text=True):
if re.search(r'(\w*)%s\b' %wordToSearch, text):
    text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' % wordToSearch, r'<a href="http://example.com/%s">%s</a>' %(wordToSearch, wordToSearch), text, re.UNICODE), 'html.parser'))

然后我用以下代码编写新文件：

with open("output1.html", "w") as file:
    file.write(str(soup))

仅当我需要在单个特定单词上添加href时，此方法才能正常工作，但是如果我想为单词列表添加href，我不知道该怎么做：

listOfWords = ["war", "love"]

for text in soup2.find_all(text=True):
    for a in listOfWords:
        if re.search(r'(\w*)%s\b' %a, text):
            text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' %a, r'<a href="https://it.wiktionary.org/wiki/%s">%s</a>' %(a, a), text, re.UNICODE), 'html.parser'))

这是我运行它时得到的：

Traceback (most recent call last):
  File "./test.py", line 110, in <module>
    text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' % wordToSearch, r'<a href="http://example.com/%s">%s</a>' %(wordToSearch, wordToSearch), text, re.UNICODE), 'html.parser'))
  File "/Library/Python/2.7/site-packages/bs4/element.py", line 235, in replace_with
    "Cannot replace one element with another when the"
ValueError: Cannot replace one element with another when theelement to be replaced is not part of a tree

Answer 1

最简单的方法是在每次通过后重建汤。

from bs4 import BeautifulSoup
import re

html = """
<html>
<p>love blah blah war blah blah  love blah blah war</p>
<p>love blah blah  blah blah  love blah blah </p>
<p>blah blah love blah blah war blah blah  love blah blah war blah blah</p>
</html>

"""
listOfWords = ["war", "love"]
for a in listOfWords:
    soup = BeautifulSoup(html, 'html.parser')
    for text in soup.find_all(text=True):
        if re.search(r'(\w*)%s\b' %a, text):
            text.replaceWith(BeautifulSoup(re.sub(r'(\w*)%s\b' %a, r'<a href="https://it.wiktionary.org/wiki/%s">%s</a>' %(a, a), text, re.UNICODE), 'html.parser'))
    html = str(soup)
print (soup)

输出：

<html>
<p><a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a> blah blah  <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a></p>
<p><a href="https://it.wiktionary.org/wiki/love">love</a> blah blah  blah blah  <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah </p>
<p>blah blah <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a> blah blah  <a href="https://it.wiktionary.org/wiki/love">love</a> blah blah <a href="https://it.wiktionary.org/wiki/war">war</a> blah blah</p>
</html>

使用BeautifulSoup为网页中的每个单词添加链接

1 个答案: