我有一个包含某些标签的HTML文件,我需要以id="rule_1"
,id="rule_1.1"
,id="rule_1.2"
,id="rule_1.2.1"
的格式向每个标签添加ID号,等等。例如,当前的HTML是:
<div style="styles">
<p class="classname">TEXT</p>
<p class="classname">TEXT</p>
<ul style="styles">
<li>
<p class="classname">TEXT</p>
</li>
<li>
<p class="classname">TEXT</p>
</li>
</ul>
</div>
我需要HTML看起来像这样:
<div style="styles" id="rule_1">
<p class="classname" id="rule_1.1">TEXT</p>
<p class="classname" id="rule_1.2">TEXT</p>
<ul style="styles" id="rule_1.3">
<li id="rule_1.3.1">
<p class="classname" id="rule_1.3.1.1">TEXT</p>
</li>
<li id="rule_1.3.2">
<p class="classname" id="rule_1.3.2.1">TEXT</p>
</li>
</ul>
</div>
我可以手动编写这些内容,但是我希望使用现有的HTML解析器库。是否可以使用BeautifulSoup或其他模块?
我尝试过这样的事情:
from bs4 import BeautifulSoup as html_parser
with open('outputs/HTML/{}.html'.format(deal), 'r') as read_file:
html_source = read_file.read()
soup = html_parser(html_source, 'html.parser')
html_tags = soup.find_all(['div', 'p', 'span', 'ul', 'li'])
for each_tag in html_tags:
each_tag.attrs['id'] = html_tags.index(each_tag)
with open('outputs/HTML/{}-id.html'.format(deal), 'w') as save_file:
save_file.write(str(soup))
但这只会添加id="1"
,id="2"
,依此类推。如何使它们像1
,1.1
,1.1.1
等那样交错排列?
答案 0 :(得分:0)
没关系,想通了:
curr_tags = {}
for each_tag in html_tags:
if html_tags.index(each_tag) == 0:
each_tag.attrs['id'] = 'rule_1'
else:
parent_id = each_tag.parent.attrs['id']
if parent_id in curr_tags.keys():
curr_tags[parent_id] += 1
else:
curr_tags[parent_id] = 1
each_tag.attrs['id'] = parent_id + '.{0}'.format(curr_tags[parent_id])