我正在尝试将python
用于selenium
和phantomjs
的抓取工具。我使用以下代码抓取问题页面:
# coding=utf-8
# Created by lruoran on 17-1-29
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.PhantomJS(r'/home/lruoran/software/phantomjs/bin/phantomjs')
driver.get('https://www.quora.com/Who-is-Roger-Federer')
cnt_answers = driver.find_element_by_xpath("//div[@class='answer_count']").text.encode('utf-8').strip().split()[0]
if cnt_answers[-1].isdigit():
cnt_answers = int(cnt_answers)
else:
cnt_answers = int(cnt_answers[:-1])
print("problem:{:s}".format(driver.find_element_by_xpath("//h1").text.encode('utf-8')))
print('the number of answers:{:d}'.format(cnt_answers))
try:
print("the number of follow:{:s}".format(
driver.find_element_by_xpath(r"//a[contains(@class,'FollowerListModalLink')]").text.encode('utf-8')))
except TimeoutException:
pass
但是,使用xpath
://a[contains(@class,'FollowerListModalLink')]
。我无法获得the problem的粉丝数量。但我使用xpath
帮助器测试xpath
,它可以成功找到该元素。
xpath helper的结果在这张图片中 -