在Python2.7中使用xpath解析html

时间:2017-06-20 02:37:36

标签: python-2.7 xpath html-parsing

我正在尝试解析Python2.7和Flask0.12中的一些HTML代码(carthtml)。 'carthtml'中有3个不同的项目。我尝试使用xpath将所有这些项目放入'def getCart()'中的'items'。

我想逐个打印每个项目的名称,所以我使用了for循环

for idx, item in enumerate(items):
    product_name = get_value_by_xpath(item,
                                    '//div[@class="product-name"]/a/text()')
    print product_name

预期输出为:

Piqué Polo Romper
Neon Little Brother Jumpsuit
OshKosh Mary Jane Sneakers

但我的实际输出是:

Piqué Polo Romper
Piqué Polo Romper
Piqué Polo Romper

我猜在for循环中,'item'不会占用1个项目,而是每次重复所有3个项目。任何帮助都将非常感激。

这是我的代码

app.py

# -*- coding: utf-8 -*-
from flask import Flask, request
from lxml import html
from lxml import etree
from datetime import datetime
import traceback
import requests
import sys
import logging.handlers


reload(sys)
sys.setdefaultencoding('utf-8')

app = Flask(__name__)

@app.route('/')
def getCart():    
    html_tree = parse_htmlpage(carthtml)
    items = get_elements_by_xpath(html_tree, '//div[@class="primary-content"]//div[@class="mini-cart-product clearfix"]')
    product_list = []
    if items is False or items is None:
        #logger.debug("[T/F:T, e_id:" + str(e_id) + ", API_URL:/add_to_cart, Msg:No data from CARTERS]")
        return jsonify({'result_code': 0})
    if len(items) > 0:
        for idx, item in enumerate(items):
            product_name = get_value_by_xpath(item,
                                           '//div[@class="product-name"]/a/text()')
            print product_name

    return


def parse_htmlpage(html_src):
    detail_html = html.fromstring(html_src)
    page_tree = etree.ElementTree(detail_html)

    return page_tree

def get_elements_by_xpath(page_tree, target_xpath):
    target_value_list = page_tree.xpath(target_xpath)
    return target_value_list

def get_value_by_xpath(page_tree, target_xpath):
    target_value = page_tree.xpath(target_xpath)
    return target_value

carthtml

<div class="primary-content">
<div class="mini-cart-product clearfix">
    <div class="mini-cart-image">
        <a href="/carters-baby-boy-one-pieces/190795039832.html"><img src="https://www.carters.com/dw/image/v2/AAMK_PRD/on/demandware.static/-/Sites-carters_master_catalog/default/dw182a85c8/hi-res/118H023_Default.jpg?sw=470" alt="Piqué Polo Romper" title="Piqué Polo Romper"></a>
    </div>

    <div class="mini-cart-attributes">
        <div class="product-name">
            <a href="/carters-baby-boy-one-pieces/190795039832.html">Piqué Polo Romper</a>
        </div>
    </div>
</div>


<div class="mini-cart-product clearfix">
    <div class="mini-cart-image">
        <a href="/carters-baby-boy-one-pieces/190795419986.html"><img src="https://www.carters.com/dw/image/v2/AAMK_PRD/on/demandware.static/-/Sites-carters_master_catalog/default/dw540ec9a5/hi-res/127G525_Default.jpg?sw=470" alt="Neon Little Brother Jumpsuit" title="Neon Little Brother Jumpsuit"></a>
    </div>


    <div class="mini-cart-attributes">
        <div class="product-name">
            <a href="/carters-baby-boy-one-pieces/190795419986.html">Neon Little Brother Jumpsuit</a>
        </div>
    </div>
</div>


<div class="mini-cart-product clearfix">

    <div class="mini-cart-image">
        <a href="/oshkosh-baby-girl-shoes-casual-shoes/888737142503.html"><img src="https://www.carters.com/dw/image/v2/AAMK_PRD/on/demandware.static/-/Sites-carters_master_catalog/default/dw32891bda/hi-res/OF150011_Navy.jpg?sw=470" alt="OshKosh Mary Jane Sneakers" title="OshKosh Mary Jane Sneakers"></a>
    </div>

    <div class="mini-cart-attributes">

        <div class="product-name">
            <a href="/on/demandware.store/Sites-Carters-Site/default/RedirectURL-CookieMigration?url=https%3a%2f%2fwww%2ecarters%2ecom%2fs%2fSites-Carters-Site%2fdw%2fshared_session_redirect%3furl%3dhttps%253A%252F%252Fwww%2eoshkosh%2ecom%252Foshkosh-baby-girl-shoes-casual-shoes%252F888737142503%2ehtml%253Fsrd%253Dtrue">OshKosh Mary Jane Sneakers</a>
        </div>
    </div>
</div>
</div>

0 个答案:

没有答案