Question

我正在尝试根据唯一的id（s）提取href值，其中p后面的数字不同但都是数字并以“

结尾

例如id =“p4423234”id =“p5547”id =“p4124234”id =“234”

<a href="/string/string-string.html" class="profile-enable" rel="nofollow" 
id="p1234">  `

`

我可以使用

grep p的值

cat p_id.html | grep "id=\"p[0-9]\+\""

但是我无法弄清楚如何在python selenium中使用find_element_by_id返回href值。

提前感谢您的帮助。我是网络抓狂的新手，但却很享受挑战。

Answer 1

要返回ID为“p [0-9] +”的所有元素：

driver.find_elements_by_xpath("//*[starts-with(@id,'p') and substring(@id,2)>=0]")

Answer 2

扩展Avinash Raj回答：

`


import re
from bs4 import BeautifulSoup
# from selenium import webdrive
# driver = webdriver.Firefox()
# driver.get("http://example.com")

html = '''<a href="/string/string-string.html" class="profile-enable" rel="nofollow"  id="p154234"> 
         <a href="/string/string-foo.html" class="profile-enable" rel="nofollow"  id="p1235">
         <a href="/string/stricccng-bar.html" class="profile-enable" rel="nofollow"  id="12555">
'''

#or

#html = driver.page_source

soup = BeautifulSoup(html)
# it will cover all cases id="p4423234" id="p5547" id="p4124234" id="234"

a =  soup.find_all('a', attrs={'id': re.compile('^p?\d+$')})
for i in a:
    print i['href']

`

Answer 3

使用xpath动态获取属性值，并使用该元素的值并繁荣！

Answer 4

您可以在BeautifulSoup中使用正则表达式来选择特定标签。

>>> from bs4 import BeautifulSoup
>>> html = '''<a href="/string/string-string.html" class="profile-enable" rel="nofollow" 
id="p1234"> <a href="/string/string-foo.html" class="profile-enable" rel="nofollow" 
id="p1235"> '''
>>> [i['href'] for i in soup.find_all('a', attrs={'id': re.compile('^p\d+$')})]
['/string/string-string.html', '/string/string-foo.html']

或

>>> [i['href'] for i in soup.find_all(attrs={'id': re.compile('^p\d+$')}, href=True)]
['/string/string-string.html', '/string/string-foo.html']

Python Selenium根据id值提取href

4 个答案: