Question

我在python中编写了一个与lxml libary结合使用的脚本，用于解析price的一些html elements（本例中为80和100）。我用xpaths来完成这项工作。当我使用.fromstring()时，我在下面的刮刀中使用的xpaths都无法正常工作。但是，当我选择从HTML导入lxml.etree时，xpath containsig contains()表达式失败。事实证明，当我在刮刀中使用多个class名称时，它会起作用，但是当从single class name中选择compound class names时，它会抛出错误。

如何在不使用compound class names的情况下处理此类情况;而是使用single class name以下.contains()模式或其他内容？

这是我的尝试：

from lxml.etree import HTML

elements =\
"""
    <li class="ProductPrice">
      <span class="Regular Price">80.00</span>
    </li>
    <li class="ProductPrice">
      <span class="Regular Price">100.00</span>
    </li>
"""
root = HTML(elements)
for item in root.findall(".//*[@class='ProductPrice']"):
    # regular = item.find('.//span[@class="Regular Price"]').text
    regular = item.find('.//span[contains(@class,"Regular")]').text
    print(regular)

顺便说一下，上面脚本中使用的注释掉的xpath工作正常。但是不能去fo .contains()表达式抛出以下错误：

Traceback (most recent call last):
  File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\SO.py", line 15, in <module>
    regular = item.find('.//span[contains(@class,"Regular")]').text
  File "src\lxml\etree.pyx", line 1526, in lxml.etree._Element.find
  File "src\lxml\_elementpath.py", line 311, in lxml._elementpath.find
  File "src\lxml\_elementpath.py", line 300, in lxml._elementpath.iterfind
  File "src\lxml\_elementpath.py", line 283, in lxml._elementpath._build_path_iterator
  File "src\lxml\_elementpath.py", line 229, in lxml._elementpath.prepare_predicate
SyntaxError: invalid predicate

最后一件事：我不希望使用compound class names因为很少有网站动态生成它们。感谢。

Answer 1

.find()仅支持基本xpath。

请尝试使用.xpath()。

示例（未经测试）......

regular = item.xpath('.//span[contains(@class,"Regular")]')[0].text

有关详细信息，请参阅http://lxml.de/xpathxslt.html。

即使使用了正确的xpath，Scraper也会抛出错误

1 个答案: