Python Selenium根据id值提取href

时间:2016-03-21 12:44:29

标签: python selenium

我正在尝试根据唯一的id(s)提取href值,其中p后面的数字不同但都是数字并以“

结尾

例如id =“p4423234”id =“p5547”id =“p4124234”id =“234”

<a href="/string/string-string.html" class="profile-enable" rel="nofollow" 
id="p1234">  `

`

我可以使用

grep p的值
cat p_id.html | grep "id=\"p[0-9]\+\""

但是我无法弄清楚如何在python selenium中使用find_element_by_id返回href值。

提前感谢您的帮助。我是网络抓狂的新手,但却很享受挑战。

4 个答案:

答案 0 :(得分:1)

要返回ID为“p [0-9] +”的所有元素:

driver.find_elements_by_xpath("//*[starts-with(@id,'p') and substring(@id,2)>=0]")

答案 1 :(得分:1)

扩展Avinash Raj回答:

`


import re
from bs4 import BeautifulSoup
# from selenium import webdrive
# driver = webdriver.Firefox()
# driver.get("http://example.com")

html = '''<a href="/string/string-string.html" class="profile-enable" rel="nofollow"  id="p154234"> 
         <a href="/string/string-foo.html" class="profile-enable" rel="nofollow"  id="p1235">
         <a href="/string/stricccng-bar.html" class="profile-enable" rel="nofollow"  id="12555">
'''

#or

#html = driver.page_source

soup = BeautifulSoup(html)
# it will cover all cases id="p4423234" id="p5547" id="p4124234" id="234"

a =  soup.find_all('a', attrs={'id': re.compile('^p?\d+$')})
for i in a:
    print i['href']

`

答案 2 :(得分:0)

使用xpath动态获取属性值,并使用该元素的值并繁荣!

答案 3 :(得分:0)

您可以在BeautifulSoup中使用正则表达式来选择特定标签。

>>> from bs4 import BeautifulSoup
>>> html = '''<a href="/string/string-string.html" class="profile-enable" rel="nofollow" 
id="p1234"> <a href="/string/string-foo.html" class="profile-enable" rel="nofollow" 
id="p1235"> '''
>>> [i['href'] for i in soup.find_all('a', attrs={'id': re.compile('^p\d+$')})]
['/string/string-string.html', '/string/string-foo.html']

>>> [i['href'] for i in soup.find_all(attrs={'id': re.compile('^p\d+$')}, href=True)]
['/string/string-string.html', '/string/string-foo.html']