函数parse
在第一页中抓取链接。函数parse_product
在下一页抓取详细信息,并检查是否有要抓取的第三页。函数parse_finalreport
在第3页中抓取详细信息。我面临的问题是第三函数parse_finalreport
的输出最后一起打印。我想要这样的结果:
dfgcontactperson
year
abstract
dfgcontactperson
empty
empty
dfgcontactperson
year
abstract
但是我得到的结果是这样的:
dfgcontactperson
dfgcontactperson
empty
empty
dfgcontactperson
year
abstract
year
abstract
我的代码:
def parse(self,response):
for row in response.xpath('//div[contains(@class,"eintrag")]'):
link = row.xpath('.//h2/a/@href').extract()
link = ['https://gepris.dfg.de' + item + '?language=en' for item in link]
for p in link:
yield scrapy.Request(p,callback=self.parse_product)
def parse_product(self, response):
dfgcontactperson = response.xpath('//div[@class="dfg_contact"]/span[@class="value"]/span/a/text()').extract()
print(dfgcontactperson)
finalreport = response.xpath('//ul[@class="tab1"]/li[@id="tabbutton2"]/a/@href').extract()
finalreport = ['https://gepris.dfg.de' + item + '?language=en' for item in finalreport]
if not finalreport:
print('empty')
print('empty')
for x in finalreport:
yield scrapy.Request(x,callback=self.parse_finalreport)
def parse_finalreport(self,response):
year = response.xpath('//div[@id="projektbeschreibung"]//span[contains(text(),"Final Report Year")]/following-sibling::span//text()').extract()
abstract = response.xpath('//div[@id="projektbeschreibung"]/h4[contains(text(),"Abstract")]/following-sibling::p/text()').extract()
print(year)
print(abstract)