Question

我有一个基本的python脚本，应该在phonearena上查找手机，我这样初始化它：

class PASpider(scrapy.Spider):
    name = "pabot"
    allowed_domains = ["http://www.phonearena.com/"]
    start_urls = ["http://www.phonearena.com/phones"]

    # Initialize the bot, takes a device name
    def __init__(self):
        device = "Nexus 6"
        words = nltk.word_tokenize(device)
        query = "http://www.phonearena.com/phones/word/"

        for word in words:
            query += word.lower()+"%20"

        query = query[0:len(query)-3]
        self.start_urls = [query]

到目前为止一直都很好，但是当我试图访问手机页面时，我收到了针对X错误的过滤异地请求，这通常应该是因为它在允许的域之外，但我无法弄明白。这是提取链接的代码，以及控制台输出：

def parse_search(self,response):
        self.log(Fore.RED + Style.BRIGHT + "Web-spider started." + Fore.RESET + Style.RESET_ALL, level=log.INFO)
        self.log(Fore.GREEN + Style.BRIGHT + "type: " + str(type(response)) + Fore.RESET + Style.RESET_ALL, level=log.INFO)

        device = Device()

        target = Selector(response=response).xpath('//a[re:test(@class, "s_thumb")]//@href').extract()
        self.log(Fore.WHITE + Style.BRIGHT + "Target link: " + target[0] + Fore.RESET + Style.RESET_ALL, level=log.INFO)

        return scrapy.Request('http://www.phonearena.com'+target[0], callback=self.parse_item)

http://i.imgur.com/hoWUaxT.png（没有代表发布图片）

知道可能导致这种情况的原因吗？

编辑：谢谢@alecxe，我不得不使用allowed_domains = [＆＃34; phonearena.com＆＃34;]。

即使我的链接在允许的域内，Scrapy也会给我异地请求

0 个答案: