我正在使用Selenium-Python来抓取此链接中的内容。 http://targetstudy.com/school/62292/universal-academy/
HTML代码就是这样,
<tr>
<td>
<i class="fa fa-mobile">
::before
</i>
</td>
<td>8349992220, 8349992221</td>
</tr>
我不确定如何使用class =&#34; fa fa-mobile&#34; 请有人帮忙。感谢
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.action_chains import ActionChains
import lxml.html
from selenium.common.exceptions import NoSuchElementException
path_to_chromedriver = 'chromedriver.exe'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get('http://targetstudy.com/school/62292/universal-academy/')
stuff = browser.page_source.encode('ascii', 'ignore')
tree = lxml.html.fromstring(stuff)
address1 = tree.xpath('//td/i[@class="fa fa-mobile"]/parent/following-sibling/following-sibling::text()')
print address1
答案 0 :(得分:2)
您不需要lxml.html
。 Locating Elements
中Selenium
功能非常强大。
将//i[@class="fa fa-mobile"]/../following-sibling::td
xpath表达式传递给find_element_by_xpath()
:
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.get('http://targetstudy.com/school/62292/universal-academy/')
>>> browser.find_element_by_xpath('//i[@class="fa fa-mobile"]/../following-sibling::td').text
u'83499*****, 83499*****'
注意,添加了*
,因为这里没有显示实数。
此处xpath首先找到带有i
类的fa fa-mobile
标记,然后转到父级并获取下一个td
兄弟元素。
希望有所帮助。