我关注code:
from lxml import etree
from io import StringIO
html = """"Hello, world!"<span class="black">
<div class="c1">division
<p>"Hello - this is me.
(c) passage in division"
<b>"bold in passage "</b>
</p>
My phone:
(+7) 999-999-99-99
</div>
<!-- Comment -->
<pre>It's a pre.</pre>
"""
def parse_HTML(html):
parser = etree.HTMLParser()
root = etree.parse(StringIO(html), parser)
for elem in root.getiterator():
# skip comments, their type == class 'cython_function_or_method'
if type(elem.tag) is not str:
continue
if elem.text is None:
text = ''
else:
text = elem.text
print(str(elem.tag) + " => " + text)
if __name__ == "__main__":
parse_HTML(html)
输出:
html =>
body =>
p => "Hello, world!"
span =>
div => division
p => "Hello - this is me.
(c) passage in division"
b => "bold in passage "
<class 'cython_function_or_method'>
pre => It's a pre.
问题: 为何选择“我的手机: (+7)999-999-99-99“输出中不存在?