Question

我是新手，这可能很琐碎。无论如何我收到以下错误：

INFO: Ignoring response <404 http://www.geographie.uni
muenchen.de/department/fiona/studium/fuer_studierende/termine/index.html/>:
HTTP status code is not handled or not allowed

我尝试更改settings.py文件中的用户代理，但未成功。有人有其他想法吗？谢谢

我的代码：

import scrapy

class DepartmentSpider(scrapy.Spider):
    name = 'department'
    start_urls = ['http://www.geographie.uni-muenchen.de/department/fiona/studium/fuer_studierende/termine/index.html/']

    def parse(self, response):
        for row in response.xpath('//table[2]/tbody'):
            yield {
                'Art' : row.xpath('td[1]//text()').extract_first(),
                'Belegfrist': row.xpath('td[2]//text()').extract_first(),
                'Klausur' : row.xpath('td[3]//text()').extract_first(),
            }

Answer 1

您在start_urls中的url末尾加了斜杠。没有它，一切都应该正常工作。

Scrapy错误：忽略响应<404 ...>：未处理或不允许HTTP状态代码

1 个答案: