我正在尝试从位于以下网站底部的交互式图表中抓取数据: https://www.marsh.com/us/insights/research/global-insurance-market-index-q4-2020.html
我在 chrome 中使用了开发者工具,但在元素选项卡中找不到数据点。
如果有人能看一下并告诉我数据点是否存储在页面上的某处,我们将不胜感激。
答案 0 :(得分:1)
该网站正在打印 Excel 文件的数据。因此您不必尝试查找图表数据输出。我为你写了一个抓取脚本。
import scrapy,os,wget
from xlrd import open_workbook
class MarshSpider(scrapy.Spider):
name = 'marsh'
allowed_domains = ['www.marsh.com']
start_urls = ['https://www.marsh.com/us/insights/research/global-insurance-market-index-q4-2020.html']
def parse(self, response):
xlsx_url = response.xpath('//div[contains(@class,"htmleditor")]//@data-csv-url').get() #Get the xlsx URL here
main_url = "https://www.marsh.com"
file = wget.download(main_url+xlsx_url) #download the url
data = open_workbook(file) #open in workbook
worksheet = data.sheet_by_index(0)
for row in range(1,worksheet.nrows):
yield{
"Global Insurance Composite Renewal Rate":worksheet.cell(row,1).value,
"Global Casualty Insurance Renewal Rate":worksheet.cell(row,2).value,
}
os.remove(file)