python-scrapy项目可返回网址列表,并在网址内部抓取内容

时间:2018-12-28 17:08:51

标签: python scrapy-spider

当前试图为每个游戏网址的爬虫抓取一个项目(页面:“ https://sportschatplace.com/nba-picks”),然后进入每个游戏的页面并在其中获取更多信息。

当我运行它时,它只是返回而没有刮擦任何页面。任何帮助,将不胜感激。这是我的代码段:

class GameSpider(scrapy.Spider):
    name = 'games'
    allowed_domains = ['sportschatplace.com']
    start_urls = [
        'https://sportschatplace.com/nba-picks'
    ]
    def parse(self, response):
        games = response.css("div.home-a").extract_first()
        for g in games:
            url = urljoin(response.url, g)
            yield scrapy.Request(url, callback = self.parse_game)

    def parse_game(self, response):
        for info in response.css('div.gutter'):
            yield {
                'game_teams': info.css('p.heading-sub').extract_first(), #check if these are correct before running
                'game_datetime': info.css('h2.heading-sub').extract_first(),
                'game_line': info.css('h3.heading-sub').extract_first(),
                # 'game_text': info.css('   ').extract(),
                'game_pick': info.css('h3.block mt1 dark-gray').extract(),
            }

1 个答案:

答案 0 :(得分:0)

div.home-a

divs包含多个extract_first(),并且您要提取第一个div并将css = '[itemprop="url"]::attr(href)' games = response.css(css).extract() #list of game urls 转换为字符串。

我从链接中得到的是,您的CSS没有给您想要的东西。

尝试一下

import React, { Component } from 'react';
import Numbers from './Numbers'
import './App.css';

class App extends Component {
  constructor(props){
    super()
    this.state={
      calcValue:0
    }
  }

  takeValue = (n) => {
    alert(n)
  }

  render() {
    return (
      <div className="App">
        <Numbers submit={(n) => this.takeValue(n)} numberValue={1}/>
      </div>
    );
  }
}

export default App;