在scrapy中使用response.css时需要数字部分

时间:2016-12-15 07:31:35

标签: python scrapy

需要从此页面获取产品名称和价格" http://www.fabfurnish.com/Koryo-KLE40DLBH1-39-inches-HD-Ready-LED-TV-Black-294567.html "。我得到了产品名称,但没有得到价格。

item["Product_Name"] = response.css("#product_name::text").extract()[0]
item["Price"] = response.xpath("#price_box::text").extract()[0]

所以输出应该是: 产品名称:Koryo KLE40DLBH1 39英寸高清就绪LED电视黑色(我得到) 价格:22,990(我不知道)

1 个答案:

答案 0 :(得分:1)

对于价格,你在.xpath()调用中使用CSS选择器,它需要一个XPath表达式。运行此命令会触发可能在日志中显示的异常。

因此,请将.xpath()更改为.css()以获取价格值:

$ scrapy shell http://www.fabfurnish.com/Koryo-KLE40DLBH1-39-inches-HD-Ready-LED-TV-Black-294567.html
2016-12-15 11:25:01 [scrapy] INFO: Scrapy 1.2.2 started (bot: scrapybot)

>>> response.css("#product_name::text").extract()
[u'Koryo KLE40DLBH1 39 inches HD Ready LED TV Black']
>>> response.css("#product_name::text").extract_first()
u'Koryo KLE40DLBH1 39 inches HD Ready LED TV Black'


>>> response.xpath("#price_box::text").extract()[0]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/scrapy/http/response/text.py", line 115, in xpath
    return self.selector.xpath(query)
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/parsel/selector.py", line 207, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/parsel/selector.py", line 203, in xpath
    **kwargs)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:57924)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:167085)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:166044)
ValueError: XPath error: Invalid expression in #price_box::text
>>> response.css("#price_box::text").extract()[0]
u'26,990'
>>> response.css("#price_box::text").extract_first()
u'26,990'

注意使用.extract_first()通常比.extract()[0]更安全(当选择器没有结果时会中断)