使用xpath获取表中的最大值

时间:2016-02-17 18:27:24

标签: python html xpath max

我有一个普通文件格式的大型html菜单文件,我需要获得每个菜单项的最高价格。这是菜单文件的一个示例:

### File Name: "menu" (All types ".") ###
</div>
     <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                10
            </td>
            <td class="menu-item-price-amount">
                14
            </td>
        </tr>
</div>

</div>
     <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                100
            </td>
            <td class="menu-item-price-amount">
                1
            </td>
        </tr>
</div>

我需要我的程序返回每个菜单项中的最高价格列表,即此示例的maxprices = ['14','100']。我在Python中尝试了以下代码:

#!/user/bin/python

from lxml import html
from os.path import join, dirname, realpath
from lxml.etree import XPath

def main():
    """ Drive function """
    fpath = join(dirname(realpath(__file__)), 'menu')
    hfile = open(fpath)  # open html file
    tree = html.fromstring(hfile.read())

    prices_path = XPath('//*[@class="menu-item-prices"]/table/tr')  
    maxprices = []

    for p in prices_path(tree):
        prices = p.xpath('//td/text()')
        prices = [el.strip() for el in prices]
        maxprice = max(prices)
        maxprices.append(maxprice)
        print maxprices

if __name__ == '__main__':
    main()

我也试过

prices = tree.xpath('//*[@class="menu-item-prices"]'
                    '//tr[not(../tr/td > td)]/text()')
prices = [el.strip() for el in prices]

而不是上面的循环策略。不返回每个类别的必要最高价格。如何修改我的代码以正确获取这些价格?谢谢。

1 个答案:

答案 0 :(得分:1)

至少有一个问题 - 您比较字符串但需要将价格转换为float,然后获得每个表格行的最大值。

完整示例:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from lxml.html import fromstring

data = """
<div>
     <div class="menu-item-prices">
       <table>
            <tr>
                <td class="menu-item-price-amount">
                    10
                </td>
                <td class="menu-item-price-amount">
                    14
                </td>
            </tr>
        </table>
    </div>

    <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                100
            </td>
            <td class="menu-item-price-amount">
                1
            </td>
        </tr>
        </table>
    </div>
</div>
"""

tree = fromstring(data)
for item in tree.xpath("//div[@class='menu-item-prices']/table/tr"):
    prices = [float(price.strip()) for price in item.xpath(".//td[@class='menu-item-price-amount']/text()")]
    print(max(prices))

打印:

14.0
100.0