Question

我发现lxml无法解析iframe的heml元素。

import lxml.html
from urllib.request import urlopen
import os
url="http://news.163.com/special/mhmingdan/?bdsj"
file=urlopen(url).read()
root=lxml.html.document_fromstring(file)
tab=root.xpath('//iframe')

如何让lxml获取iframe的html元素？

Answer 1

您应该使用正斜杠//而不是反斜杠\\：

tab = root.xpath('//iframe')

此外，您可以通过将urlopen结果直接传递到parse()来简化页面和解析：

root = lxml.html.parse(urlopen(url))

Answer 2

page = requests.get(url)  
tree = html.fromstring(page.content)
src_url = tree.cssselect("iframe") 
print src_url[0].attrib

如何在python中解析lxml中的iframe？

2 个答案: