我首先要提到的是,我在Stack Overflow上也看到了与此类似的其他问题,并尝试了建议的解决方案,但是,每次迭代都会复制有问题的行为。
我正在尝试从此网站https://www.marketwatch.com/investing/stock/aapl/financials提取数据以进行一些财务分析,但是,转储到我的csv文件中始终为空。
我试图在刮擦的外壳中找出问题,似乎我的“ in values”从未评估为true,但是我不确定为什么,因为初始response.xpath确实会打印表值。
代码在下面。感谢您的帮助,谢谢大家!
values = ["Sales/Revenue", "Cost of Goods Sold (COGS) incl. D&A", "Depreciation & Amortization Expense", "Gross Income", "SG&A Expense", "Research & Development", "EBIT after Unusual Expense", "Pretax Income", "Income Tax", "Net Income", "EBITDA"]
for row in response.xpath('//table[@class="crDataTable"]/tbody/tr[not(contains(@class,"thead"))]'):
test = row.xpath('/td[1]//text()').extract()
for i in values:
if i in test:
item['rowTitle'] = row.xpath('/td[1]//text()').extract()
item['year1'] = row.xpath('/td[2]//text()').extract()
item['year2'] = row.xpath('/td[3]//text()').extract()
item['year3'] = row.xpath('/td[4]//text()').extract()
item['year4'] = row.xpath('/td[5]//text()').extract()
item['present'] = row.xpath('/td[6]//text()').extract()
yield item