当我使用Scrapy爬行Web数据时,它会响应调试204并调试“未收到任何数据!”我已经使用此代码来爬网许多数据,但是这次不起作用。我不知道为什么。
代码和反馈如下。
class SoccerSpider(scrapy.Spider):
name = 'soccer'
start_urls = [
'https://www.transfermarkt.com/wettbewerbe/europa']
def parse(self, response):
soup = BeautifulSoup(response.body, 'html.parser')
tags = soup.find_all('a', href=re.compile(r'.*/startseite/wettbewerb/.*'))
print(tags)
for tag in tags[:14]:
url = re.findall(r'https://www.transfermarkt.com/.+', response.urljoin(tag.get('href')))
if len(url) == 0:
continue
else:
yield scrapy.Request(url[0], callback=self.parse1, dont_filter=True)
反馈如下:
2019-05-01 17:16:38 [scrapy.core.engine] DEBUG: Crawled (204) <GET https://www.transfermarkt.com/robots.txt> (referer: http://www.web.cn/)
2019-05-01 17:16:38 [scrapy.core.engine] DEBUG: Crawled (204) <GET https://www.transfermarkt.com/wettbewerbe/europa> (referer: http://www.web.cn/)
2019-05-01 17:16:38 [chardet.universaldetector] DEBUG: no data received!
2019-05-01 17:16:38 [chardet.universaldetector] DEBUG: no data received!
设置标题如下
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'host': 'www.web.cn',
'Referer': 'http://www.web.cn/',
'Cookie': 'is cookis'
}