无法从python selenium中提取文本

时间:2016-10-26 09:44:54

标签: python selenium selenium-webdriver web-scraping

我已经编写了以下代码来提取网址的详细价格。

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = ('Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36')
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get("https://www.walmart.com/ip/Fitness-Reality-TR3000-Maximum-Weight-Capacity-Manual-Treadmill-with-Pacer-Control-and-Heart-Rate-System/37455841#")
driver.find_element_by_css_selector("div[itemprop='price']:nth-of-type(1)").text

虽然我们在特定标签内部有价格详细信息,但它给出了空值。

当我尝试使用以下内容提取innerHTML该标记而不是text时。

driver.find_element_by_css_selector("div[itemprop='price']:nth-of-type(1)").get_attribute("innerHTML")

我得到了这些结果

u' <span class="Price-sup">$</span>199<span class="Price-mark">.</span><span class="Price-sup">00</span> '

它清楚地表明我在标签内部有文本199,但我无法提取它。我在这里错过了什么吗?

1 个答案:

答案 0 :(得分:0)

要获得以下代码所需的价格:

price = driver.find_element_by_xpath('//div[@itemprop="price"]').text
# due to hidden element <span class="Price-mark">.</> 
# you need to specify floating point by yourself
# or you will get $19900 as result which is not what you expect 
price = '.'.join([price[:-2], price[-2:]])

结果:$199.00