Question

我正在尝试使用python从网站中提取一些数据。我发现（document完全符合我的问题。

但是当我运行提供的代码时

from lxml import html
import requests


page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)

#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')


print 'Buyers: ', buyers
print 'Prices: ', prices

我收到错误：

  File "C:\Python27\lib\site-packages\lxml\html\__init__.py", line 617, in document_fromstring
    "Document is empty")

ParserError: Document is empty

任何人都知道问题可能是什么？

Answer 1

你的脚本对我很好。我得到了输出：

Buyers:  ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices:  ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', '$15.00', '$114.07', '$10.09']

我建议你试试latest lxml package。并检查此时您是否可以使用desired webpage。

使用Python进行HTML Scraping，document_fromstring为空

1 个答案: