使用python scrapy从HTML中获取嵌套元素

时间:2017-05-23 06:41:48

标签: python scrapy web-crawler

<div class="product ">
<div class="information">
    <h4 class="name ">Biryani</h4>
        <p class="description ">mutton mix biryani</p>
        </div>

    <div class="details">
        <div class="orderDetail">
            <p class="price ">&#163;12.95</p>

        </div>
    </div>

对于每个课程产品,我想获取课程名称和价格的文字。

预期输出为:

姓名:Biryani, 价格:12.95

2 个答案:

答案 0 :(得分:1)

import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['http://localhost:8000/data.html']

    def parse(self, response):
                products = response.xpath("//div[starts-with(@class, 'product ')]")  

                for product in products:
                    _name = product.xpath(".//h4[starts-with(@class, 'name ')]").extract_first()

                    _price = product.xpath(".//p[starts-with(@class, 'price')]").extract_first()

                    print (_name, _price, counter)

答案 1 :(得分:1)

/// <reference path="[YOUR IMPORT FILE]" />
/// ...

/**
 * Main entry point for RequireJS
 */
require(
    [
        // YOUR IMPORT DEFINITIONS
    ],
    (/* YOUR IMPORT VARIABLES */) => {
        'use strict';

        // YOUR CODE HERE
    }
);