Question

我正在构建一个需要从十几个不同网站中剔除价格的刮刀。

所有网站都使用JS来显示价格，所以我选择了selenium来获取所需的数据。

在开始构建刮刀之前，我创建了一个xpath列表，我需要获取我在外部文件中抓取的每个URL的price元素。

我使用FireFox和Firebug获得了那些xpath，Hoàwever我得到了一个错误，每当我尝试用selenium（PhantomJS驱动程序）获取这些元素时：

selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Una
ble to find element with xpath './div'","request":{"headers":{"Accept":"applicat
ion/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"89
","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51048","User
-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"usin
g\": \"xpath\", \"sessionId\": \"27f41b80-cf63-11e6-bbcc-13b1a315759a\", \"value
\": \"./div\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"ele
ment","directory":"/","path":"/element","relative":"/element","port":"","host":"
","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/
element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/27f41b80-cf
63-11e6-bbcc-13b1a315759a/element"}}

似乎我的xpath是错误的，但我使用其他插件仔细检查它，每次我在Firefox上测试xpath时它都是正确的。

以下是两个不同的xpath，它们都应该有效（他们使用的是Firfox，但没有使用selenium）：

"id('regular-hero')/div[3]/div[1]/div[2]/div/div[1]/div/div/span"

"/html/body/div[2]/div[3]/div[1]/div[2]/div/div[2]/div/div/span/text()[1]"

这是目标网页html代码Here

以下是使用selenium获取元素的代码：

self.browser = wd.PhantomJS()
for n in xrange(len(self.url_list)):
    url = self.url_list[n]
    provider = self.provider_list[n]
    self.browser.get(url)
    for plan in provider:
        for hosting_plan in provider[plan]:
            xpath = hosting_plan.values()[0] # Get the xpath of a plan
            price_elem = self.browser.find_element_by_xpath("//*")
            print price_elem

self.browser.close()

所有循环都用于遍历保存xpath列表的JSON外部文件。

有什么问题？我该怎么办？ lxml可以帮助我（鉴于HTML代码有时会被破坏）？

Answer 1

根据提供的链接，您可以将所需元素与以下XPath匹配：

//span[@class="term-price"]

如果使用JavaScript生成元素，则需要等待元素外观：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//span[@class='term-price']")))

为什么这个xpath不能使用PhantomJS？

1 个答案: