我使用scrapy抓取此页面的html中的某些元素=> “http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0”
我想知道为什么以及如何解决这个问题,请帮助我。
但我遇到了问题,日志如下:
2014-05-09 18:08:46+0800 [crawlitemfromshop] DEBUG: Crawled (200) <GET http://s.taobao.com/search?q=%E6%AF%94%E6%9C%88%E6%97%97%E8%88%B0%E5%BA%97&app=shopsearch> (referer: http://www.taobao.com)
2014-05-09 18:08:46+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://jump.taobao.com/jump?target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0> from <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0>
2014-05-09 18:08:58+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://pass.tmall.com/add?_tb_token_=C78bVoU0RJwQ&cookie2=94d965b75c3fba2ce3164b5f0477a021&t=c72fdb50b3ed76e6f300eedc81959d7e&target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0&pacc=u-98EKfnz4MXPDJOhj0Zfg==&opi=222.128.8.99&tmsc=1399630137646263> from <GET http://jump.taobao.com/jump?target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0>
2014-05-09 18:09:11+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://list.tmall.com/search_shopitem.htm?tbpm=1&user_id=753523703&n=60&s=0> from <GET http://pass.tmall.com/add?_tb_token_=C78bVoU0RJwQ&cookie2=94d965b75c3fba2ce3164b5f0477a021&t=c72fdb50b3ed76e6f300eedc81959d7e&target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0&pacc=u-98EKfnz4MXPDJOhj0Zfg==&opi=222.128.8.99&tmsc=1399630137646263>
2014-05-09 18:09:21+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0> from <GET http://list.tmall.com/search_shopitem.htm?tbpm=1&user_id=753523703&n=60&s=0>
2014-05-09 18:09:21+0800 [crawlitemfromshop] DEBUG: Filtered duplicate request: <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2014-05-09 18:09:21+0800 [crawlitemfromshop] INFO: Closing spider (finished)
答案 0 :(得分:1)
您可以禁用settings.py中的DUPEFILTER_CLASS
进行再次测试吗?
Scrapy shell命令在我的最后工作正常。我可以获得所有商品和价格信息
scrapy shell 'http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0'