我正在尝试从此页面抓取商品目录编号:
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
url = 'http://www.enciclovida.mx/busquedas/resultados?utf8=%E2%9C%93&busqueda=basica&id=&nombre=astomiopsis+exserta&button='
response = HtmlResponse(url=url)
使用css选择器(在R中与rvest :: html_nodes一起使用)
".result-nombre-container > h5:nth-child(2) > a:nth-child(1)"
我想检索目录ID,在这种情况下应该是:
6011038
如果可以通过xpath更轻松地完成操作
答案 0 :(得分:1)
我在这里没什么问题,但是测试了这个xpath,它将为您带来href:
//div[contains(@class, 'result-nombre-container')]/h5[2]/a/@href
如果您在scrapy和CSS选择器语法方面遇到太多麻烦,我还建议您试用 BeautifulSoup python软件包。使用BeautifulSoup,您可以做类似的事情
link.get('href')
答案 1 :(得分:1)
如果您需要从id
解析href
:
catalog_id = response.xpath("//div[contains(@class, 'result-nombre-container')]/h5[2]/a/@href").re_first( r'(\d+)$' )
答案 2 :(得分:0)
h5元素中似乎只有一个链接。简而言之:
Fatal error: Uncaught Symfony\Component\Debug\Exception\ClassNotFoundException: Attempted to load class "SensioFrameworkExtraBundle" from namespace "Sensio\Bundle\FrameworkExtraBundle".
Did you forget a "use" statement for another namespace? in /Users/dam/Development/Alara/rayflex/git/rayborn/src/Kernel.php:33
Stack trace:
#0 /Users/dam/Development/Alara/rayflex/git/rayborn/vendor/symfony/http-kernel/Kernel.php(492): App\Kernel->registerBundles()
#1 /Users/dam/Development/Alara/rayflex/git/rayborn/vendor/symfony/http-kernel/Kernel.php(132): Symfony\Component\HttpKernel\Kernel->initializeBundles()
#2 /Users/dam/Development/Alara/rayflex/git/rayborn/vendor/symfony/framework-bundle/Console/Application.php(64): Symfony\Component\HttpKernel\Kernel->boot()
#3 /Users/dam/Development/Alara/rayflex/git/rayborn/vendor/symfony/console/Application.php(148): Symfony\Bundle\FrameworkBundle\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#4 /Users/dam/ in /Users/dam/Development/Alara/rayflex/git/rayborn/src/Kernel.php on line 33