Question

[抱歉，我的问题太天真了。我是python和请求的新手。]我已经尝试了很多方法，但是仍然找不到在html.fromstring中传递lxml.etree.ParserError的方法。

我的代码如下：

    from lxml import html
    import requests
    import time
    import csv
    import pandas as pd

    start = time.time()
    data = {}
    data['webid'] = []
    data['name'] = []
    filename = "xxx"+ str(N) + ".csv"

    for i in range(N):
            url = "http://xxx."+str(i)+".html"
            print(url)
            try:
                    page = requests.get(url,timeout=120)
                    try:
                            tree = html.fromstring(page.content)
                            name = tree.xpath('//h2[starts-with(text(),"Name")]/text()')
                            data['webid'].append(i)
                            data['name'].append(name)

                    except (html.ParseError, ParseError):
                            continue
            except requests.exceptions.RequestException as e:
                    continue

            dataframe = pd.DataFrame(data)
            dataframe.to_csv(filename, index=False, sep='|')

    print("took", time.time() - start, "seconds.")

错误显示为：

    Traceback (most recent call last):
      File "xxx/xxx.py", line 43, in <module>
        tree = html.fromstring(page.content)
      File "\AppData\Local\Programs\Python\Python37-32\lib\site-packages\lxml\html\__init__.py", line 876, in fromstring
        doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
      File "\AppData\Local\Programs\Python\Python37-32\lib\site-packages\lxml\html\__init__.py", line 765, in document_fromstring "Document is empty")
    lxml.etree.ParserError: Document is empty

我尝试了几种例外，例如（lxml.etree.ParserError，etree.ParserError，tree.ParserError，html.ParserError），但它们均无效，并产生新错误：

Traceback (most recent call last):
   File "D:/Google/Research2017/t/schoolpc/v241.py", line 66, in <module>
    except (lxml.etree.ParserError, ParseError):
NameError: name 'lxml' is not defined

请问是否存在传递ParserError并继续循环而不引发错误的方法？非常感谢你！

非常感谢@BoboDarph，解决方案如下：

from lxml.etree import ParseError
from lxml.etree import ParserError
...except (ParserError, ParseError)

“ NameError是由导入方式和内容引起的。在import语句中，导入了lxml（html）的特定模块，因此无法调用lxml.etree.ParserError或ParseError，因为它们未导入。请导入特定的另一个导入语句中的异常（类似于来自lxml.etree import ParseError的异常）– BoboDarph“

如何在html.fromstring（lxml）中处理和传递lxml.etree.ParserError

0 个答案: