[抱歉,我的问题太天真了。我是python和请求的新手。]我已经尝试了很多方法,但是仍然找不到在html.fromstring中传递lxml.etree.ParserError的方法。
我的代码如下:
from lxml import html
import requests
import time
import csv
import pandas as pd
start = time.time()
data = {}
data['webid'] = []
data['name'] = []
filename = "xxx"+ str(N) + ".csv"
for i in range(N):
url = "http://xxx."+str(i)+".html"
print(url)
try:
page = requests.get(url,timeout=120)
try:
tree = html.fromstring(page.content)
name = tree.xpath('//h2[starts-with(text(),"Name")]/text()')
data['webid'].append(i)
data['name'].append(name)
except (html.ParseError, ParseError):
continue
except requests.exceptions.RequestException as e:
continue
dataframe = pd.DataFrame(data)
dataframe.to_csv(filename, index=False, sep='|')
print("took", time.time() - start, "seconds.")
错误显示为:
Traceback (most recent call last):
File "xxx/xxx.py", line 43, in <module>
tree = html.fromstring(page.content)
File "\AppData\Local\Programs\Python\Python37-32\lib\site-packages\lxml\html\__init__.py", line 876, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "\AppData\Local\Programs\Python\Python37-32\lib\site-packages\lxml\html\__init__.py", line 765, in document_fromstring "Document is empty")
lxml.etree.ParserError: Document is empty
我尝试了几种例外,例如(lxml.etree.ParserError,etree.ParserError,tree.ParserError,html.ParserError),但它们均无效,并产生新错误:
Traceback (most recent call last):
File "D:/Google/Research2017/t/schoolpc/v241.py", line 66, in <module>
except (lxml.etree.ParserError, ParseError):
NameError: name 'lxml' is not defined
请问是否存在传递ParserError并继续循环而不引发错误的方法?非常感谢你!
非常感谢@BoboDarph,解决方案如下:
from lxml.etree import ParseError
from lxml.etree import ParserError
...except (ParserError, ParseError)
“ NameError是由导入方式和内容引起的。在import语句中,导入了lxml(html)的特定模块,因此无法调用lxml.etree.ParserError或ParseError,因为它们未导入。请导入特定的另一个导入语句中的异常(类似于来自lxml.etree import ParseError的异常)– BoboDarph“