Question

我正在尝试将python用于selenium和phantomjs的抓取工具。我使用以下代码抓取问题页面：

# coding=utf-8

# Created by lruoran on 17-1-29

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.PhantomJS(r'/home/lruoran/software/phantomjs/bin/phantomjs')
driver.get('https://www.quora.com/Who-is-Roger-Federer')
cnt_answers = driver.find_element_by_xpath("//div[@class='answer_count']").text.encode('utf-8').strip().split()[0]
if cnt_answers[-1].isdigit():
    cnt_answers = int(cnt_answers)
else:
    cnt_answers = int(cnt_answers[:-1])
print("problem:{:s}".format(driver.find_element_by_xpath("//h1").text.encode('utf-8')))
print('the number of answers:{:d}'.format(cnt_answers))
try:
    print("the number of follow:{:s}".format(
        driver.find_element_by_xpath(r"//a[contains(@class,'FollowerListModalLink')]").text.encode('utf-8')))
except TimeoutException:
    pass

但是，使用xpath：//a[contains(@class,'FollowerListModalLink')]。我无法获得the problem的粉丝数量。但我使用xpath帮助器测试xpath，它可以成功找到该元素。

xpath helper的结果在这张图片中 -

quora crawler使用selenium和phantomjs，无法获取统计信息

0 个答案: