Question

我试图在Scrapy框架中使用CSS选择器选择/匹配HTML中的元素。但是，我被困在我希望用最后一个子选择器提取的其中一个字段中。

这是HTML：

<td class="Table-Standard-AwardName Table-Scholarship-AwardName">

<a id="ctl00_ContentPlaceHolder1_ScholarshipDataControl_grvScholarshipSearch_ctl02_hylScholarshipName" class="bold" href="/Scholarships/14123/Family-Bursary,-The">Family Bursary, The</a>   

<br>

<span>Field of Study:</span> 

EcologyEnvironmental Science

</td>

文字＆＃34; EcologyEnvironmental Science ＆＃34;是我必须匹配。

当我使用最后一个子选择器时，输出显示＆＃39;研究领域＆＃39;：

In [3]: response.css('td.Table-Standard-AwardName.Table-Scholarship-AwardName > *:last-child::text').extract_first()
Out[3]: 'Field of Study:'

我已经查看了其他问题并尝试了多种方式，例如nth-last-child() and combined sibling选择器，但无济于事。救命！

Answer 1

＆＃34; EcologyEnvironmental Science＆＃34;不是一个元素（如span，div或其他），而只是td的一部分内容。所以它不符合条件... > * ...，这意味着＆＃34;该班级td的任何直接孩子。

你必须把它放入一个范围才能通过CSS只选择那部分内容，比如

...
  <span>Field of Study:</span> 
  <span>EcologyEnvironmental Science</span>
</td>

Answer 2

由于已经被告知，EcologyEnvironmental Science文本是td元素的一部分，这就是为什么你只需要提取文本，尝试这样的事情：

values = response.css('.Table-Standard-AwardName.Table-Scholarship-AwardName::text').extract()
out = next(filter(None, map(methodcaller('strip'), values)))
# you can assign 'EcologyEnvironmental Science' to your item

scrapy css last-child选择器无法选择文本

2 个答案: