我不能使用scrapy来抓取这个网址" http://list.tmall.com/search_shopitem.htm?user_id = 753523703& n = 60& s = 0"

时间:2014-05-09 10:15:50

标签: python scrapy

我使用scrapy抓取此页面的html中的某些元素=> “http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0

我想知道为什么以及如何解决这个问题,请帮助我。

但我遇到了问题,日志如下:

2014-05-09 18:08:46+0800 [crawlitemfromshop] DEBUG: Crawled (200) <GET http://s.taobao.com/search?q=%E6%AF%94%E6%9C%88%E6%97%97%E8%88%B0%E5%BA%97&app=shopsearch> (referer: http://www.taobao.com)
2014-05-09 18:08:46+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://jump.taobao.com/jump?target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0> from <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0>
2014-05-09 18:08:58+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://pass.tmall.com/add?_tb_token_=C78bVoU0RJwQ&cookie2=94d965b75c3fba2ce3164b5f0477a021&t=c72fdb50b3ed76e6f300eedc81959d7e&target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0&pacc=u-98EKfnz4MXPDJOhj0Zfg==&opi=222.128.8.99&tmsc=1399630137646263> from <GET http://jump.taobao.com/jump?target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0>
2014-05-09 18:09:11+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://list.tmall.com/search_shopitem.htm?tbpm=1&user_id=753523703&n=60&s=0> from <GET http://pass.tmall.com/add?_tb_token_=C78bVoU0RJwQ&cookie2=94d965b75c3fba2ce3164b5f0477a021&t=c72fdb50b3ed76e6f300eedc81959d7e&target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0&pacc=u-98EKfnz4MXPDJOhj0Zfg==&opi=222.128.8.99&tmsc=1399630137646263>
2014-05-09 18:09:21+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0> from <GET http://list.tmall.com/search_shopitem.htm?tbpm=1&user_id=753523703&n=60&s=0>
2014-05-09 18:09:21+0800 [crawlitemfromshop] DEBUG: Filtered duplicate request: <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2014-05-09 18:09:21+0800 [crawlitemfromshop] INFO: Closing spider (finished)

1 个答案:

答案 0 :(得分:1)

您可以禁用settings.py中的DUPEFILTER_CLASS进行再次测试吗? Scrapy shell命令在我的最后工作正常。我可以获得所有商品和价格信息 scrapy shell 'http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0'