我试图从this link
中提取所有类名适合正则表达式模式frag-0-0,frag-1-0等的标签我正在尝试使用以下代码检索它
driver = webdriver.Chrome(chromedriver)
for frg in frgs:
driver.get(URL + frg[1:])
frags=driver.find_elements_by_id(re.compile('frag-[0-9]-0'))
for frag in frags:
for tag in frag.find_elements_by_css_selector('[class^=fragmark]'):
lst.append([tag.get_attribute('class'), tag.text])
driver.quit()
return lst
但是我收到了一个错误。这样做的正确方法是什么?
错误如下:
Traceback (most recent call last):
File "vroni.py", line 119, in <module>
op('Aaf')
File "vroni.py", line 104, in op
plags=getplags(cd)
File "vroni.py", line 95, in getplags
frags=driver.find_elements_by_id(re.compile('frag-[0-9]-0'))
File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 281, in find_elements_by_id
return self.find_elements(by=By.ID, value=id_)
File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 778, in find_elements
'value': value})['value']
File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 398, in execute
data = utils.dump_json(params)
File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/utils.py", line 34, in dump_json
return json.dumps(json_struct)
File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_sre.SRE_Pattern object at 0xb668b1b0> is not JSON serializable
答案 0 :(得分:1)
功能{{1}} takes a string as an object, not a regular expression。我不确定您使用的功能是否可以使用正则表达式,即使是字符串也是如此。
您可能想尝试XPath。
答案 1 :(得分:1)
Selenium find_elements_by_id
方法需要一个简单的字符串,但re.compile
的输出是一个正则表达式对象,可以使用其match()
和search()
方法进行匹配,如下所述:
reobject = re.compile(pattern)
result = reobject.match(string)
一般来说,我建议不要在元素位置使用正则表达式。必须有另一种方法来找到这个元素。也许类名,css甚至是XPath。