我正在使用scrapy选择器,并且尝试从下面的HTML声明中提取元素“ 1”:
<li aria-label="Pagina" class="page active"><a href="#">1</a></li>
在整个HTML源内容中,我有两个相等的声明。
<div class="row paging-bar">
<ul class="sync-pagination pagination pull-right">
<li aria-label="Pagina" class="prev"><a href="#"><</a></li>
<li aria-label="Pagina" class="page active"><a href="#">1</a></li>
<li aria-label="Pagina" class="page"><a href="#">2</a></li>
<li aria-label="Pagina" class="page"><a href="#">3</a></li>
<li aria-label="Pagina" class="page"><a href="#">4</a></li>
<li aria-label="Pagina" class="page"><a href="#">5</a></li>
<li aria-label="Pagina" class="page"><a href="#">6</a></li>
<li><span>...</span></li>
<li aria-label="Pagina" class="page"><a href="#">1405</a></li>
<li aria-label="Pagina" class="next"><a href="#">></a></li>
</ul>
</div>
<div class="row paging-bar">
<ul class="sync-pagination pagination pull-right">
<li aria-label="Pagina" class="prev"><a href="#"><</a></li>
<li aria-label="Pagina" class="page active"><a href="#">1</a></li>
<li aria-label="Pagina" class="page"><a href="#">2</a></li>
<li aria-label="Pagina" class="page"><a href="#">3</a></li>
<li aria-label="Pagina" class="page"><a href="#">4</a></li>
<li aria-label="Pagina" class="page"><a href="#">5</a></li>
<li aria-label="Pagina" class="page"><a href="#">6</a></li>
<li><span>...</span></li>
<li aria-label="Pagina" class="page"><a href="#">1405</a></li>
<li aria-label="Pagina" class="next"><a href="#">></a></li>
</ul>
</div></div>
我尝试了以下命令:
response.xpath("normalize-space(//li[@class='page active']/a[@href]/text())").extract_first()
但是它返回了一个空字符串。
答案 0 :(得分:0)
有效。
>>> html = """
... <div class="row paging-bar">
... <ul class="sync-pagination pagination pull-right">
... <li aria-label="Pagina" class="prev"><a href="#"><</a></li>
... <li aria-label="Pagina" class="page active"><a href="#">1</a></li>
... <li aria-label="Pagina" class="page"><a href="#">2</a></li>
... <li aria-label="Pagina" class="page"><a href="#">3</a></li>
... <li aria-label="Pagina" class="page"><a href="#">4</a></li>
... <li aria-label="Pagina" class="page"><a href="#">5</a></li>
... <li aria-label="Pagina" class="page"><a href="#">6</a></li>
... <li><span>...</span></li>
... <li aria-label="Pagina" class="page"><a href="#">1405</a></li>
... <li aria-label="Pagina" class="next"><a href="#">></a></li>
... </ul>
... </div>
... """
>>> from parsel import Selector
>>> selector = Selector(text=html)
>>> selector.xpath("normalize-space(//li[@class='page active']/a[@href]/text())").extract_first()
'1'
>>> html = """
... <div class="row paging-bar">
... <ul class="sync-pagination pagination pull-right">
... <li aria-label="Pagina" class="prev"><a href="#"><</a></li>
... <li aria-label="Pagina" class="page active"><a href="#">1</a></li>
... <li aria-label="Pagina" class="page"><a href="#">2</a></li>
... <li aria-label="Pagina" class="page"><a href="#">3</a></li>
... <li aria-label="Pagina" class="page"><a href="#">4</a></li>
... <li aria-label="Pagina" class="page"><a href="#">5</a></li>
... <li aria-label="Pagina" class="page"><a href="#">6</a></li>
... <li><span>...</span></li>
... <li aria-label="Pagina" class="page"><a href="#">1405</a></li>
... <li aria-label="Pagina" class="next"><a href="#">></a></li>
... </ul>
... </div></div>
... """
>>> selector = Selector(text=html)
>>> selector.xpath("normalize-space(//li[@class='page active']/a[@href]/text())").extract_first()
'1'