无法使用lxml从HTML中提取值

时间:2016-05-16 00:08:05

标签: python python-3.x xpath lxml

我有这段HT​​ML(数字不同):

<span class="ng-binding"> <b>Total:</b> 68.71€ (459 items) </span>

除此之外,我想提取68.71€ (459 items)

到目前为止,我尝试使用这段代码,只是将xpath复制到Google Chrome上面显示的span类:

import urllib.request
from lxml import html
import os

ids =  ["ftpstorage1-730",
        "ftpstorage2-730",
        "ftpstorage3-730"]

for id in ids:

url = 'http://steam.tools/itemvalue/#/'+id
with urllib.request.urlopen(url) as response:
    site = response.read()
    tree = html.fromstring(site)
    data = tree.xpath('//*[@id="container"]/div[5]/span[1]/text()')

    print(data)

从理论上说这应该有效,但它不会成功,我所得到的只有data

[" {{(items | filter:dupesFilter | filter:typeFilter | filter:filterText |   sumByKey:'price':'count':
e}}\n\t\t\t\t({{items | filter:dupesFilter | filter:typeFilter |    filter:filterText | sumByKey:'count
[" {{(items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'price':'count':
e}}\n\t\t\t\t({{items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'count
[" {{(items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'price':'count':
e}}\n\t\t\t\t({{items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'count

知道我做错了什么?

它与生成的数字有关,而不是静态的吗?

如果是这样,我怎么能提取数字呢?

1 个答案:

答案 0 :(得分:2)

您在控制台上看到的内容是带有AngularJS绑定占位符的未呈现的HTML 。您需要一个真正的浏览器来执行javascript,并让Angular将实际值放入占位符。

或者,如果您更深入地了解如何检索和计算总价格,您可以在不使用真实浏览器的情况下解决问题。向提供http://item-value10.appspot.com/ParseInvid参数的app端点发出GET请求,解析JSON响应并计算项目计入帐户的价格:

import requests


template_url = "http://item-value10.appspot.com/ParseInv"
ids = ["ftpstorage1-730", "ftpstorage2-730", "ftpstorage3-730"]

for id in ids:
    with requests.Session() as session:
        session.get('http://steam.tools/itemvalue/#/' + id)

        storage, app = id.split("-")
        url = template_url.format(storage=storage, app=app)

        response = session.get(url, params={
            "id": storage,
            "app": app
        }, headers={
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36",
            "Referer": "http://steam.tools/itemvalue/"
        })

        data = response.json()
        total = sum(float(item["price"]) * int(item["count"]) for item in data["items"])
        print(total)

打印:

20.439999999999998
78.16
0