当前试图为每个游戏网址的爬虫抓取一个项目(页面:“ https://sportschatplace.com/nba-picks”),然后进入每个游戏的页面并在其中获取更多信息。
当我运行它时,它只是返回而没有刮擦任何页面。任何帮助,将不胜感激。这是我的代码段:
class GameSpider(scrapy.Spider):
name = 'games'
allowed_domains = ['sportschatplace.com']
start_urls = [
'https://sportschatplace.com/nba-picks'
]
def parse(self, response):
games = response.css("div.home-a").extract_first()
for g in games:
url = urljoin(response.url, g)
yield scrapy.Request(url, callback = self.parse_game)
def parse_game(self, response):
for info in response.css('div.gutter'):
yield {
'game_teams': info.css('p.heading-sub').extract_first(), #check if these are correct before running
'game_datetime': info.css('h2.heading-sub').extract_first(),
'game_line': info.css('h3.heading-sub').extract_first(),
# 'game_text': info.css(' ').extract(),
'game_pick': info.css('h3.block mt1 dark-gray').extract(),
}
答案 0 :(得分:0)
div.home-a
divs
包含多个extract_first()
,并且您要提取第一个div
并将css = '[itemprop="url"]::attr(href)'
games = response.css(css).extract() #list of game urls
转换为字符串。
我从链接中得到的是,您的CSS没有给您想要的东西。
尝试一下
import React, { Component } from 'react';
import Numbers from './Numbers'
import './App.css';
class App extends Component {
constructor(props){
super()
this.state={
calcValue:0
}
}
takeValue = (n) => {
alert(n)
}
render() {
return (
<div className="App">
<Numbers submit={(n) => this.takeValue(n)} numberValue={1}/>
</div>
);
}
}
export default App;