我正在尝试从下面显示的HTML格式中提取问题
<li >
<h3 > Number Theory - Factors < /h3 >
<p lang = "title" > How many factors of 2 < sup > 5 < /sup > * 3 < sup > 6 < /sup > * 5 < sup > 2 < /sup > are perfect squares?< /p >
<ol class = "xyz" >
<li > 18 < /li >
<li > 24 < /li >
<li > 36 < /li >
<li > 8 < /li >
< / ol >
<ul class="exp">
<li class="grey fleft">
<span class="qlabs_tooltip_bottom qlabs_tooltip_style_33" style="cursor:pointer;">
<span>
<strong>Correct Answer</strong>Choice (B).</br>24
</span> Correct answer
</span>
</li>
<li class="primary fleft">
<a href="factors_3.shtml">Explanatory Answer</a>
</li>
<li class="grey1 fleft">Factors - Perfect Squares</li>
<li class="orange flrt">Medium</li>
</ul>
</li>
我可以使用XPath表达式提取我的问题 normalize-space(// p [@class =&#34; soln&#34;])
XPath表达式提取并给我这个文本 24 * 53 * 74有多少因素是奇数?
我如何得到sub和sup里面的问题? 可能性1:我得到的问题是&#34;有多少因素2 4 * 5 3 * 7 4 是奇数?不会丢失sub或sup&#34;
可能性2 我得到的问题为&#34; 2 ^ 4 * 5 ^ 3 * 7 ^ 4的多少个因子是奇数?基本上我不想改变问题的含义?&#34;
答案 0 :(得分:1)
这不是很漂亮,但我们可以使用<sup>
预先替换^
并删除</sup>
遗留问题:
In [1]: response = response.replace(body=response.body.replace("<sup>", "^").replace("</sup>", ""))
In [2]: response.xpath('normalize-space(//p[@lang="title"])').extract_first()
Out[2]: u'How many factors of 2 ^ 5 * 3 ^ 6 * 5 ^ 2 are perfect squares?'
答案 1 :(得分:0)
我对screpy还不熟悉我还可以添加一些用java编写的代码示例来帮助你
// get inner html of your question with `sup` or `sub` tags
String question = driver.findElement(By.xpath("//p[@lang = 'title'] ")).getAttribute("innerHTML");
// Replace the tags with symbols
String newQuestion = question.replace("<sup>", "^").replace("</sup>", "");
System.out.println(newQuestion);