Python中的XPATH语法验证器

时间:2018-05-03 09:46:00

标签: python xpath web-scraping web-crawler

我开发了一个包含许多操作的爬虫。许多xpath涉及,因此我使用json文件进行存储。然后爬虫开始运行我想在xpaths上进行基本语法检查(在xpath使用之前),并为无效的xpaths引发错误。

例如:

xpath1 = '//*[@id="react-root"]/section'
xpath2 = '//*[[@id="react-root"]/section'
xpath3 = '//*[@id="react-root"]\section'

从这些xpath只有xpath1有效

是否有任何模块或正则表达式进行此类验证?

2 个答案:

答案 0 :(得分:2)

您可以使用lxml.etree.XPath编译xpath字符串,如果语法不正确,将引发异常:

>>> import lxml.etree
>>> lxml.etree.XPath('//*[@id="react-root"]/section')
//*[@id="react-root"]/section
>>> lxml.etree.XPath('//*[[@id="react-root"]/section')
Traceback (most recent call last):
  ...
lxml.etree.XPathSyntaxError: Invalid expression
>>> lxml.etree.XPath(r'//*[@id="react-root"]\section')
Traceback (most recent call last):
  ...
lxml.etree.XPathSyntaxError: Invalid expression

答案 1 :(得分:0)

from selenium import webdriver;
webdriver.Chrome().find_elements('xpath', '//*[text(),"invalid xpath"]')