使用scrapy提取AJAX内容和java脚本内容

时间:2016-09-30 08:18:10

标签: python scrapy web-crawler

我正在尝试抓取此site,并想要提取呼叫按钮内的联系号码。

如何实现此代码?

1 个答案:

答案 0 :(得分:1)

似乎正在使用电话号码检索html字符串的简单AJAX请求:

enter image description here

import re
import scrapy

class MySpider(scrapy.Spider):
    name = 'sophone'
    start_urls = [
        'http://www.freeindex.co.uk/profile(the-main-event-management-company)_266537.htm'
    ]

    def parse(self, response):
        # item id can be extracted from url
        item_id = re.findall("(\d+)\.htm", response.url)[0]
        # phone api can be made using this id
        url = 'http://www.freeindex.co.uk/customscripts' \
              '/popup_view_tel_details.asp?id={}'.format(item_id)
        yield scrapy.Request(url, self.parse_phone)

    def parse_phone(self, response):
        from scrapy.shell import inspect_response
        inspect_response(response, self)