Scrapy 204没有收到数据

时间:2019-05-01 07:49:25

标签: python scrapy

当我使用Scrapy爬行Web数据时,它会响应调试204并调试“未收到任何数据!”我已经使用此代码来爬网许多数据,但是这次不起作用。我不知道为什么。

代码和反馈如下。

class SoccerSpider(scrapy.Spider):
    name = 'soccer'
    start_urls = [
        'https://www.transfermarkt.com/wettbewerbe/europa']

    def parse(self, response):
        soup = BeautifulSoup(response.body, 'html.parser')
        tags = soup.find_all('a', href=re.compile(r'.*/startseite/wettbewerb/.*'))
        print(tags)
        for tag in tags[:14]:
            url = re.findall(r'https://www.transfermarkt.com/.+', response.urljoin(tag.get('href')))
            if len(url) == 0:
                continue
            else:
                yield scrapy.Request(url[0], callback=self.parse1, dont_filter=True)

反馈如下:

2019-05-01 17:16:38 [scrapy.core.engine] DEBUG: Crawled (204) <GET https://www.transfermarkt.com/robots.txt> (referer: http://www.web.cn/)
2019-05-01 17:16:38 [scrapy.core.engine] DEBUG: Crawled (204) <GET https://www.transfermarkt.com/wettbewerbe/europa> (referer: http://www.web.cn/)
2019-05-01 17:16:38 [chardet.universaldetector] DEBUG: no data received!
2019-05-01 17:16:38 [chardet.universaldetector] DEBUG: no data received!

设置标题如下

DEFAULT_REQUEST_HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
    'Accept-Encoding': 'gzip, deflate',
    'Connection': 'keep-alive',
    'host': 'www.web.cn',
    'Referer': 'http://www.web.cn/',
    'Cookie': 'is cookis'
}

0 个答案:

没有答案