试图让lxml在python中打印一个特定的数字

时间:2016-08-14 17:29:48

标签: python web-scraping lxml

我试图让lxml在python中打印所选内容: http://imgur.com/a/joeql

我所拥有的代码并不多,但这里是

from lxml import html
import requests


page = requests.get('https://www.pathofexile.com/forum/view-thread/1703834')
tree = html.fromstring(page.content)

winner = tree.xpath(//*[@id="eventView0"]/div[3]/table/tbody/tr[1]/td[7])

print,winner

1 个答案:

答案 0 :(得分:1)

您看到的语法错误是因为您没有将XPath字符串括在引号中,修复它:

for_each

实际问题是表内容是通过在浏览器中执行的JavaScript动态形成的。您可以做的是解析在JSON对象中具有所需数据的winner = tree.xpath('//*[@id="eventView0"]/div[3]/table/tbody/tr[1]/td[7]') 标记,提取JSON字符串并通过script将其加载到Python数据结构中:

json.loads()

打印帐户名称(就像它正在运行的证据一样):

import json
import re

from lxml import html
import requests


page = requests.get('https://www.pathofexile.com/forum/view-thread/1703834')
tree = html.fromstring(page.content)

script = tree.xpath('//script[contains(., "var json")]/text()')[0]
obj_string = re.search(r"var json = (\{.*?\}),$", script, re.MULTILINE).group(1)
obj = json.loads(obj_string)

# print entries
entries = obj['ladder']['entries']
for entry in entries:
    print(entry['account']['name'])