我有一个刮板,该刮板可以正常工作,并且可以轻松地将其放入CSV文件,但是它总是以奇怪的顺序返回值。
我检查以确保items.py字段的顺序正确,并尝试在Spider的各个字段之间移动,但是我不知道为什么它以怪异的方式产生了它们。
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy import Selector
from scrapy.loader import ItemLoader
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from sofifa_scraper.items import Player
class FifaInfoScraper(scrapy.Spider):
name = "player2_scraper"
start_urls = ["https://www.futhead.com/19/players/?level=all_nif&bin_platform=ps"]
def parse(self,response):
for href in response.css("li.list-group-item > div.content > a::attr(href)"):
yield response.follow(href, callback = self.parse_name)
def parse_name(self,response):
item = Player()
item['name'] = response.css("div[itemprop = 'child'] > span[itemprop = 'title']::text").get() #Get player name
club_league_nation = response.css("div.col-xs-5 > a::text").getall() #club, league, nation are all stored under same selectors, so pull them all at once
item['club'],item['league'],item['nation'] = club_league_nation #split the selected info from club_league_nation into 3 seperate categories
yield item
我希望刮板在第一列中返回玩家名称,但不要太在意此后的顺序。不过,玩家名称总是以另一列结尾,并且发生在我仅提取名称和另一个值时。
答案 0 :(得分:1)
只需在您的FEED_EXPORT_FIELDS
(documentation)中添加settings.py
:
FEED_EXPORT_FIELDS = ["name", "club", "league", "nation"]