Question

我正在尝试从this site上的页面中提取类别名称和问题/答案文本，并使用Python将它们插入到我自己的HTML文档中。我已经能够使用soup.find_all("td", class_="clue_text)提取线索文本了，理论上我知道如何提取其他数据，但我不知道如何将这些数据插入到我自己的HTML文档中，特别是考虑到BeautifulSoup输出一个列表，我的文本格式与源不同。例如，我希望线索文本替换＆＃34;类别2问题5＆＃34;在以下HTML中：

<table id="4_1" cellpadding="0" cellspacing="0" width="100%" 
class="hiddenDiv" onclick="hidequestion(this.id);" border="0"><tr><td 
valign="middle" align="center">
Category 2 Question 5
</td></tr></table>

我如何使用BeautifulSoup输出到我的文档中？有没有更好的方法可以用呢？

Answer 1

您可以使用.string属性更改任何标记的文本/字符串。

>>> html = '''<table id="4_1" cellpadding="0" cellspacing="0" width="100%"
... class="hiddenDiv" onclick="hidequestion(this.id);" border="0"><tr><td
... valign="middle" align="center">
... Category 2 Question 5
... </td></tr></table>'''
>>> soup = BeautifulSoup(html, 'lxml')
>>> clue = 'this is my clue text'
>>> first_rowcol = soup.find('table').find('td')
>>> first_rowcol
<td align="center" valign="middle">
Category 2 Question 5
</td>
>>> first_rowcol.string = clue
>>> first_rowcol
<td align="center" valign="middle">this is my clue text</td>

或者，如果您想将td标记替换为使用BeautifulSoup找到的td标记，则可以使用replace_with()函数。

>>> first_row = soup.find('table').tr
>>> first_row
<tr><td align="center" valign="middle">
Category 2 Question 5
</td></tr>
>>> clue_tag = BeautifulSoup('<td>this is my clue tag</td>', 'html.parser')
>>> first_row.td.replace_with(clue_tag)
>>> first_row
<tr><td>this is my clue tag</td></tr>

使用BeautifulSoup将文本从一个HTML文档传输到另一个HTML文档

1 个答案: