如何通过scrapy shell爬网特定的网站

时间:2019-12-04 16:05:14

标签: python scrapy

我想通过刮擦的外壳爬行this site,我尝试过:

$ scrapy shell 'https://aaav2.hinet.net/A1/AuthScreen.jsp'

以及用户代理

$ scrapy shell -s USER_AGENT='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2
Safari/601.3.9' 'https://aaav2.hinet.net/A1/AuthScreen.jsp'

然后view(response),但一无所获。

有人会帮助我获得正确的view(response), 就像我直接在浏览器中打开此URL一样?

1 个答案:

答案 0 :(得分:0)

您正在重定向到另一个URL。做:

$ scrapy shell "https://aaav2.hinet.net/A1/error.jsp?aa-eurl=edc68fe62571d6617ef5f42113d9068aa9f6600e320d55084d75fbf2cd244155e02b9b684284ed94c52ee591d2edde9a&mesg=aa-version+parameter+is+required%21%3Cbr+%2F%3Eaa-productid+parameter+is+required%21%3Cbr+%2F%3Eaa-curl+parameter+is+required%21%3Cbr+%2F%3Eaa-eurl+parameter+is+required%21%3Cbr+%2F%3Eaa-fee+parameter+is+required%21%3Cbr+%2F%3E&aa-eurlDesc=&aa-device=pc&aa-usage=&aa-propertiesKey=&aa-language="

view(response)会向您显示一个页面,就像您的浏览器一样。