我正在尝试根据唯一的id(s)提取href值,其中p后面的数字不同但都是数字并以“
结尾例如id =“p4423234”id =“p5547”id =“p4124234”id =“234”
<a href="/string/string-string.html" class="profile-enable" rel="nofollow"
id="p1234"> `
`
我可以使用
grep p的值cat p_id.html | grep "id=\"p[0-9]\+\""
但是我无法弄清楚如何在python selenium中使用find_element_by_id返回href值。
提前感谢您的帮助。我是网络抓狂的新手,但却很享受挑战。
答案 0 :(得分:1)
要返回ID为“p [0-9] +”的所有元素:
driver.find_elements_by_xpath("//*[starts-with(@id,'p') and substring(@id,2)>=0]")
答案 1 :(得分:1)
扩展Avinash Raj回答:
`
import re
from bs4 import BeautifulSoup
# from selenium import webdrive
# driver = webdriver.Firefox()
# driver.get("http://example.com")
html = '''<a href="/string/string-string.html" class="profile-enable" rel="nofollow" id="p154234">
<a href="/string/string-foo.html" class="profile-enable" rel="nofollow" id="p1235">
<a href="/string/stricccng-bar.html" class="profile-enable" rel="nofollow" id="12555">
'''
#or
#html = driver.page_source
soup = BeautifulSoup(html)
# it will cover all cases id="p4423234" id="p5547" id="p4124234" id="234"
a = soup.find_all('a', attrs={'id': re.compile('^p?\d+$')})
for i in a:
print i['href']
`
答案 2 :(得分:0)
使用xpath动态获取属性值,并使用该元素的值并繁荣!
答案 3 :(得分:0)
您可以在BeautifulSoup中使用正则表达式来选择特定标签。
>>> from bs4 import BeautifulSoup
>>> html = '''<a href="/string/string-string.html" class="profile-enable" rel="nofollow"
id="p1234"> <a href="/string/string-foo.html" class="profile-enable" rel="nofollow"
id="p1235"> '''
>>> [i['href'] for i in soup.find_all('a', attrs={'id': re.compile('^p\d+$')})]
['/string/string-string.html', '/string/string-foo.html']
或
>>> [i['href'] for i in soup.find_all(attrs={'id': re.compile('^p\d+$')}, href=True)]
['/string/string-string.html', '/string/string-foo.html']