Question

我是学习python的新手我有这个html页面，包含以下信息：

我想阅读html页面并以这种方式打印信息：

['2011/2016', 'aaaa', 'x-t ', 'htu ',  '***' , '55']

Answer 1

您要做的是称为网络抓取。您可以使用lxml库来解析html并获得结果。

lxml instalation说明：http://lxml.de/installation.html

在stackoverflow主页中列出问题的示例：

import requests
from lxml.etree import HTML

host = 'http://stackoverflow.com/'

resp = requests.get(host)

tree = HTML(resp.text)

questions = tree.xpath('.//a[@class="question-hyperlink"]')

for question in questions:
    print(question.text)

如何使用python读取html页面？

1 个答案: