我有一些HTML代码
<li><h3>Number Theory - Even Factors</h3>
<p lang="title">Number N = 2<sup>6</sup> * 5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?</p>
<ol class="xyz">
<li>1183</li>
<li>1200</li>
<li>1050</li>
<li>840</li>
</ol>
<ul class="exp">
<li class="grey fleft">
<span class="qlabs_tooltip_bottom qlabs_tooltip_style_33" style="cursor:pointer;">
<span>
<strong>Correct Answer</strong>
Choice (A).</br>1183
</span>
Correct answer
</span>
</li>
<li class="primary fleft">
<a href="factors_6.shtml">Explanatory Answer</a>
</li>
<li class="grey1 fleft">Factors - Even numbers</li>
<li class="orange flrt">Medium</li>
</ul>
</li>
在上面的HTML代码段中,我尝试提取<p lang="title"> Notice how it has <sup></sup> and <sub></sub> tags being used inside.
我的Xpath表达式.// p [@lang =&#34; title&#34;] / text()不检索sub和sup内容。如何在
下面获得此输出所需输出
Number N = 2<sup>6</sup>*5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?
答案 0 :(得分:0)
的XPath
您可以使用innerHTML
获取node()
,如下所示:
//p[@lang="title"]/node()
请注意,它会返回一个节点数组
的Python
您可以使用以下innerHTML
代码
Python
from BeautifulSoup import BeautifulSoup
def innerHTML(element):
"Function that receives element and returns its innerHTML"
return element.decode_contents(formatter="html")
html = """<html>
<head>...
<body>...
Your HTML source code
..."""
soup = BeautifulSoup(html)
paragraph = soup.find('p', { "lang" : "title" })
print(innerHTML(paragraph))
输出:
'Number N = 2<sup>6</sup> * 5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?'