Webdriver在python

时间:2016-07-05 20:39:54

标签: python regex selenium

我试图从this link

中提取所有类名适合正则表达式模式frag-0-0,frag-1-0等的标签

我正在尝试使用以下代码检索它

    driver = webdriver.Chrome(chromedriver)
    for frg in frgs:
        driver.get(URL + frg[1:])
        frags=driver.find_elements_by_id(re.compile('frag-[0-9]-0'))
    for frag in frags:
            for tag in frag.find_elements_by_css_selector('[class^=fragmark]'):
                lst.append([tag.get_attribute('class'), tag.text])
    driver.quit()
    return lst

但是我收到了一个错误。这样做的正确方法是什么?

错误如下:

Traceback (most recent call last):
  File "vroni.py", line 119, in <module>
    op('Aaf')
  File "vroni.py", line 104, in op
    plags=getplags(cd)
  File "vroni.py", line 95, in getplags
    frags=driver.find_elements_by_id(re.compile('frag-[0-9]-0'))
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 281, in find_elements_by_id
    return self.find_elements(by=By.ID, value=id_)
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 778, in find_elements
    'value': value})['value']
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 398, in execute
    data = utils.dump_json(params)
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/utils.py", line 34, in dump_json
    return json.dumps(json_struct)
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_sre.SRE_Pattern object at 0xb668b1b0> is not JSON serializable

2 个答案:

答案 0 :(得分:1)

功能{{1}} takes a string as an object, not a regular expression。我不确定您使用的功能是否可以使用正则表达式,即使是字符串也是如此。

您可能想尝试XPath

答案 1 :(得分:1)

Selenium find_elements_by_id方法需要一个简单的字符串,但re.compile的输出是一个正则表达式对象,可以使用其match()search()方法进行匹配,如下所述:

reobject = re.compile(pattern)
result = reobject.match(string)

一般来说,我建议不要在元素位置使用正则表达式。必须有另一种方法来找到这个元素。也许类名,css甚至是XPath。