为什么我得到一个空数组(scrapy shell response.xpath())?

时间:2016-06-17 09:17:27

标签: python xpath scrapy

我想知道为什么response.xpath()在this page中返回一个空数组[],即使我response.xpath('//div').extract()也是如此! 例如:

$ scrapy shell https://www.amazon.cn/b/2127529051
...
>>> response.xpath('//div').extract()
[]

我可以从首页获得一些结果,但我无法从其他许多网页获得任何结果。

BTW,我没有尝试抓取亚马逊或其他东西,它仅仅是为了学习目的。

我也试过其他网站,但没有遇到这个问题,所以我想知道原因。

有什么想法吗?

感谢

2 个答案:

答案 0 :(得分:0)

  

160617 14:45:13 mysqld_safe Starting mysqld daemon with databases from /Applications/MAMP/db/mysql 2016-06-17T09:45:13.908521Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2016-06-17T09:45:13.976451Z 0 [Warning] Insecure configuration for --secure-file-priv: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path. 2016-06-17T09:45:13.976701Z 0 [Note] /Applications/MAMP/Library/bin/mysqld (mysqld 5.7.9) starting as process 28255 ... 2016-06-17T09:45:14.111330Z 0 [Warning] Setting lower_case_table_names=2 because file system for /Applications/MAMP/db/mysql/ is case insensitive 2016-06-17T09:45:14.136797Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2016-06-17T09:45:14.136828Z 0 [Note] InnoDB: Uses event mutexes 2016-06-17T09:45:14.136838Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2016-06-17T09:45:14.136849Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3 2016-06-17T09:45:14.137452Z 0 [Note] InnoDB: Number of pools: 1 2016-06-17T09:45:14.145153Z 0 [Note] InnoDB: Not using CPU crc32 instructions 2016-06-17T09:45:14.230174Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2016-06-17T09:45:14.299060Z 0 [Note] InnoDB: Completed initialization of buffer pool 2016-06-17T09:45:14.395299Z 0 [ERROR] InnoDB: ./ib_logfile0 can't be opened in read-write mode. 2016-06-17T09:45:14.395336Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2016-06-17T09:45:14.707209Z 0 [ERROR] Plugin 'InnoDB' init function returned error. 2016-06-17T09:45:14.707274Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2016-06-17T09:45:14.707335Z 0 [ERROR] Failed to initialize plugins. 2016-06-17T09:45:14.707361Z 0 [ERROR] Aborting 2016-06-17T09:45:14.707400Z 0 [Note] Binlog end 2016-06-17T09:45:14.707585Z 0 [Note] Shutting down plugin 'CSV' 2016-06-17T09:45:14.719175Z 0 [Note] /Applications/MAMP/Library/bin/mysqld: Shutdown complete 160617 14:45:14 mysqld_safe mysqld from pid file /Applications/MAMP/tmp/mysql/mysql.pid ended

这条线路错了。首先,你在这里使用正斜杠而不是反斜杠。此外,response.xpath('\\div').execute()不是execute()Selector个对象的方法(这些是response.xpath()方法返回的值)。

尝试:SelectorList

除了您的代码被破坏之外,最好关闭javascript并运行response.xpath("//div").extract()以确切了解您的蜘蛛所看到的内容。在某些情况下,您的蜘蛛甚至可能看不到view(response),因为它已使用javascript加载。

答案 1 :(得分:0)

  • response.xpath('\ div')。execute()

  • 只有少数原因

    1. 用户代理:scrapy shell site name -s USER_AGENT='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36'

    2. 您的响应为空,请尝试响应。它显示200-300,然后还可以

    3. 根据站点的路径错误

这应该可以解决您的问题