Question

我正在使用Scrapy刮取ntry.com 这是主页的网址 ntry.com/#/main.php,

我要抓的特定页面是 http://ntry.com/#/scores/named_ladder/main.php

但由于一个我不知道的原因，我不能刮错页面。这是我的代码。

import scrapy


class NtrySpider(scrapy.Spider):
name = "ntry"
allowed_domains = ["ntry.com"]
start_urls = [
    "http://ntry.com/#/scores/named_ladder/main.php"
    ]

def parse(self, response):
    filename = 'ntryex1'
    with open(filename, 'wb') as f:
        f.write(response.body)

” DEBUG：Crawled（200）http://ntry.com/#/scores/named_ladder/main.html> （引用者：无）

使用此代码，我总是在ntry.com/#/main.php上抓取内容，但我的start_urls是http://ntry.com/#/scores/named_ladder/main.php。

你能告诉我这是什么问题吗？

Answer 1

scrapy默认情况下不加载java脚本..尝试在浏览器中禁用java脚本并打开所需的URL，Scrapy实际上是在禁用java脚本后接收浏览器中的响应。

如果您想使用scrapy处理javascript，请查看Splash

有了Scrapy，我不能因为不明原因刮掉一个网站（也许是因为某种重定向）

1 个答案: