我对此还是很陌生,我想知道是否有更简单的方法来分隔文本。现在,我在excel中工作,并且在一个单元中拥有多个数据。分开他们没意思 实际上,我的数据是一个由三个field()组成的类,看起来像这样(每个A可以有多个B;每个B可以有7x C):
A,“ B1,B2”,“ C1,C2,C3,…,C14”
我想这样填充/保存它:
A,B1,C1
A,B1,C2
...
A,B1,C7
A,B2,...
这是我的代码:
class Heroes1Item(scrapy.Item):
hero_name = scrapy.Field()
hero_builds = scrapy.Field()
hero_buildskills = scrapy.Field()
和
import scrapy
from heroes1.items import Heroes1Item
from scrapy import Request, Item, Field
class Heroes1JobSpider(scrapy.Spider):
name = 'heroes1_job'
allowed_domains = ['icy-veins.com']
start_urls = ['https://www.icy-veins.com/heroes/assassin-hero-guides']
def parse(self, response):
heroes_xpath = '//div[@class="nav_content_block_entry_heroes_hero"]/a/@href'
for link in response.xpath(heroes_xpath).extract():
yield Request(response.urljoin(link), self.parse_hero)
def parse_hero(self, response):
hero_names = response.xpath('//span[@class="page_breadcrumbs_item"]/text()').extract()
hero_buildss = response.xpath('//h3[@class="toc_no_parsing"]/text()').extract()
hero_buildskillss = response.xpath('//span[@class="heroes_build_talent_tier_visual"]').extract()
for item in zip(hero_names, hero_buildss, hero_buildskillss):
new_item = Heroes1Item()
new_item['hero_name'] = item[0]
#new_item['hero_builds'] = item[1] DATALOSS
#new_item['hero_buildskills'] = item[2] DATALOSS
new_item['hero_builds'] = response.xpath('//h3[@class="toc_no_parsing"]/text()').extract()
new_item['hero_buildskills'] = response.xpath('//span[@class="heroes_build_talent_tier_visual"]').extract()
yield new_item
感谢您的帮助和任何想法!
答案 0 :(得分:0)
我认为问题出在这部分:zip(hero_names, hero_buildss, hero_buildskillss)
。如果我理解正确,您想制作3个列表的笛卡尔积,您可以这样操作:
import itertools
hero_lists = [hero_names, hero_buildss, hero_buildskillss]
for item in itertools.product(*hero_lists):
new_item = Heroes1Item()
new_item['hero_name'] = item[0]
new_item['hero_builds'] = item[1]
new_item['hero_buildskills'] = item[2]
yield new_item
如果hero-buils和herobuildskillss之间存在依赖关系,则以下方法可能会更好:
hero_names = response.xpath('//span[@class="page_breadcrumbs_item"]/text()').extract()
hero_builds_xpath = response.xpath('//*[@class="heroes_build"]')
for hero_build_xpath in hero_builds_xpath:
hero_buildss = hero_build_xpath.xpath('.//h3[@class="toc_no_parsing"]/text()').extract()
hero_buildskillss = hero_build_xpath.xpath('.//span[@class="heroes_build_talent_tier_visual"]').extract()
new_item = Heroes1Item()
new_item['hero_name'] = hero_names
new_item['hero_builds'] = hero_buildss
new_item['hero_buildskills'] = hero_buildskillss
yield new_item
答案 1 :(得分:0)
您可以使用函数将构建技能拆分为多个块(例如chunks()
here),并执行以下操作:
for item in zip(hero_names, hero_buildss, hero_buildskillss):
builds = response.xpath('//h3[@class="toc_no_parsing"]/text()').extract()
skills = response.xpath('//span[@class="heroes_build_talent_tier_visual"]').extract()
skill_chunks = chunks(skills, 7)
for build, skill_chunk in zip(builds, skill_chunks):
for skill in skill_chunk:
new_item = Heroes1Item()
new_item['hero_name'] = item[0]
new_item['hero_build'] = build
new_item['hero_buildskill'] = skill
yield new_item