我有一个网站,我想保存两个span元素值。
这是我的HTML代码的相关部分:
<div class="box-search-product-filter-row">
<span class="result-numbers" sth-bind="model.navigationSettings.showFilter">
<span class="number" sth-bind="span1"></span>
<span class="result" sth-bind="span2"></span>
</span>
</div>
我创造了一只蜘蛛:
from scrapy.spiders import Spider
from scrapy.selector import Selector
class MySpdier(Spider):
name = "list"
allowed_domains = ["example.com"]
start_urls = [
"https://www.example.com"]
def parse(self, response):
sel = Selector(response)
divs = sel.xpath("//div[@class='box-search-product-filter-row']")
for div in divs:
sth = div.xpath("/span[class='result']/text()").extract()
print sth
当我爬行蜘蛛时,它只会打印出来:
[]
有人可以帮助我如何从我的两个(类号和类结果)span元素中获取值?
答案 0 :(得分:1)
您在xpath @
中忘记了"/span[class='result']/text()"
。此外,您所寻找的范围不是一级孩子,因此您需要使用.//
代替/
。看到:
资料来源:http://www.w3schools.com/xsl/xpath_syntax.asp
完整且正确的xpath将是:".//span[@class='result']"
+&#39; / text()&#39;如果你只想选择文本,但你的例子中的节点没有文字,所以它不会在这里工作。
答案 1 :(得分:0)
这对你有用
修改强>
from scrapy.spiders import Spider
from scrapy.selector import Selector
class MySpdier(Spider):
name = "list"
allowed_domains = ["example.com"]
start_urls = [
"https://www.example.com"]
def parse(self, response):
sel = Selector(response)
divs = sel.xpath("//div[@class='box-search-product-filter-row']")
for div in divs:
sth = div.xpath(".//span[@class='result']/text()").extract()
print sth