我试图让lxml在python中打印所选内容: http://imgur.com/a/joeql
我所拥有的代码并不多,但这里是
from lxml import html
import requests
page = requests.get('https://www.pathofexile.com/forum/view-thread/1703834')
tree = html.fromstring(page.content)
winner = tree.xpath(//*[@id="eventView0"]/div[3]/table/tbody/tr[1]/td[7])
print,winner
答案 0 :(得分:1)
您看到的语法错误是因为您没有将XPath字符串括在引号中,修复它:
for_each
实际问题是表内容是通过在浏览器中执行的JavaScript动态形成的。您可以做的是解析在JSON对象中具有所需数据的winner = tree.xpath('//*[@id="eventView0"]/div[3]/table/tbody/tr[1]/td[7]')
标记,提取JSON字符串并通过script
将其加载到Python数据结构中:
json.loads()
打印帐户名称(就像它正在运行的证据一样):
import json
import re
from lxml import html
import requests
page = requests.get('https://www.pathofexile.com/forum/view-thread/1703834')
tree = html.fromstring(page.content)
script = tree.xpath('//script[contains(., "var json")]/text()')[0]
obj_string = re.search(r"var json = (\{.*?\}),$", script, re.MULTILINE).group(1)
obj = json.loads(obj_string)
# print entries
entries = obj['ladder']['entries']
for entry in entries:
print(entry['account']['name'])