如何修复lxml断言错误

时间:2015-04-10 21:08:01

标签: python lxml

我有一台运行pythong.2.7.6的ubuntu机器。当我尝试使用已使用pip安装的lxml时,出现以下错误:

Traceback (most recent call last):
  File "./export.py", line 44, in fetch_item
    root.append(elem)
  File "lxml.etree.pyx", line 742, in lxml.etree._Element.append     (src/lxml/lxml.etree.c:44339)
  File "apihelpers.pxi", line 24, in lxml.etree._assertValidNode     (src/lxml/lxml.etree.c:14127)
AssertionError: invalid Element proxy at 140443984439416

这意味着什么,我应该如何解决这个问题?

1 个答案:

答案 0 :(得分:2)

我在multiprocessing上下文中遇到了同样的问题。它可以通过以下代码段来说明:

from multiprocessing import Pool

import lxml.html


def process(html):
    tree = lxml.html.fromstring(html)
    body = tree.find('.//body')
    print(body)
    return body


def main():
    pool = Pool()
    result = pool.apply(process, ('<html><body/></html>',))
    print(type(result))
    print(result)  


if __name__ == '__main__':
    main()

运行它的结果是以下输出:

<Element body at 0x7f9f690461d8>
<class 'lxml.html.HtmlElement'>
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    main()
  File "test.py", line 14, in main
    print(result)
  File "src/lxml/lxml.etree.pyx", line 1142, in lxml.etree._Element.__repr__ (src/lxml/lxml.etree.c:54748)
  File "src/lxml/lxml.etree.pyx", line 992, in lxml.etree._Element.tag.__get__ (src/lxml/lxml.etree.c:53182)
  File "src/lxml/apihelpers.pxi", line 19, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:16856)
AssertionError: invalid Element proxy at 139697870845496

因此,考虑到__repr__从工作进程起作用并且返回值可用于调用进程,最明显的解释是反序列化问题。例如,可以通过返回lxml.html.tostring(body)或任何其他pickle能够对象来解决此问题。