Python + Scrapy + JSON + XPath:如何使用Scrapy刮取JSON数据

时间:2018-10-12 12:04:04

标签: python json xpath scrapy

我知道如何使用Scrapy为HTML数据点获取XPATH。但是我必须在此站点上将此页面的所有URL(起始URL)都以JSON格式抓取:

https://highape.com/bangalore/all-events

查看源:https://highape.com/bangalore/all-events

我通常以这种格式写这个:

def parse(self, response):
      events = response.xpath('**What To Write Here?**').extract()

      for event in events:
          absolute_url = response.urljoin(event)
          yield Request(absolute_url, callback = self.parse_event)

请告诉我在“在这里写些什么?”中应该写些什么。部分。

enter image description here

2 个答案:

答案 0 :(得分:1)

查看URL的页面源,然后复制第76-9045行并将其另存为data.json在本地驱动器中,然后使用此代码...

import json
from bs4 import BeautifulSoup
import requests
req = requests.get('https://highape.com/bangalore/all-events')
soup = BeautifulSoup(req.content, 'html.parser')
js = soup.find_all('script')[5].text
data = json.loads(js, strict=False)
for i in data:
    url = i['url']
    print(url)
    ##callback with scrapy

答案 1 :(得分:0)

  

在这里写什么?

events = response.xpath("//script[@type='application/ld+json']").extract()
events = json.loads(events[0])