我正在使用BeautifulSoup解析HTML文件并且遇到了< BR>标签。 我想追加< BR>插入列表元素后标记,但它没有工作。 最简单的方法是什么?
soup = BeautifulSoup(open("test.html"))
mylist = [Item_1,Item_2]
for i in range(len(mylist)):
#insert Items to the 4. column
这是默认的HTML:
<html>
<body>
<table>
<tr>
<th>
1. Column
</th>
<th>
2. Column
</th>
<th>
3. Column
</th>
<th>
4. Column
</th>
<th>
5. Column
</th>
<th>
6. Column
</th>
<th>
7. Column
</th>
<th>
8. Column
</th>
</tr>
<tr class="a">
<td class="h">
Text in first column
</td>
<td>
<br/>
</td>
<td>
<br/>
</td>
<td>
<!--I want to insert items here-->
</td>
<td>
1
</td>
<td>
37
</td>
<td>
38
</td>
<td>
38
</td>
</tr>
</table>
</body>
</html>
这是我要制作的HTML
<html>
<body>
<table>
<tr>
<th>
1. Column
</th>
<th>
2. Column
</th>
<th>
3. Column
</th>
<th>
4. Column
</th>
<th>
5. Column
</th>
<th>
6. Column
</th>
<th>
7. Column
</th>
<th>
8. Column
</th>
</tr>
<tr class="a">
<td class="h">
Text in first column
</td>
<td>
<br/>
</td>
<td>
<br/>
</td>
<td>
Item_1 <br>
Item_2
</td>
<td>
1
</td>
<td>
37
</td>
<td>
38
</td>
<td>
38
</td>
</tr>
</table>
</body>
</html>
答案 0 :(得分:3)
要附加标记,首先使用new_tag()
工厂函数创建它,如下所示:
soup.td.append(soup.new_tag('br'))
考虑以下计划。对于html中的每个表格单元格(即每td
个),它会向该单元格附加<br/>
标记和一些文本。
from bs4 import BeautifulSoup
html_doc = '''
<html>
<body>
<table>
<tr>
<td>
data1
</td>
<td>
data2
</td>
</tr>
</table>
</body>
</html>
'''
soup = BeautifulSoup(html_doc)
mylist = ['addendum 1', 'addendum 2']
for td,item in zip(soup.find_all('td'), mylist):
td.append(soup.new_tag('br'))
td.append(item)
print soup.prettify()
结果:
<html>
<body>
<table>
<tr>
<td>
data1
<br/>
addendum 1
</td>
<td>
data2
<br/>
addendum 2
</td>
</tr>
</table>
</body>
</html>