我已经编写了以下代码来提取网址的详细价格。
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = ('Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36')
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get("https://www.walmart.com/ip/Fitness-Reality-TR3000-Maximum-Weight-Capacity-Manual-Treadmill-with-Pacer-Control-and-Heart-Rate-System/37455841#")
driver.find_element_by_css_selector("div[itemprop='price']:nth-of-type(1)").text
虽然我们在特定标签内部有价格详细信息,但它给出了空值。
当我尝试使用以下内容提取innerHTML
该标记而不是text
时。
driver.find_element_by_css_selector("div[itemprop='price']:nth-of-type(1)").get_attribute("innerHTML")
我得到了这些结果
u' <span class="Price-sup">$</span>199<span class="Price-mark">.</span><span class="Price-sup">00</span> '
它清楚地表明我在标签内部有文本199
,但我无法提取它。我在这里错过了什么吗?
答案 0 :(得分:0)
要获得以下代码所需的价格:
price = driver.find_element_by_xpath('//div[@itemprop="price"]').text
# due to hidden element <span class="Price-mark">.</>
# you need to specify floating point by yourself
# or you will get $19900 as result which is not what you expect
price = '.'.join([price[:-2], price[-2:]])
结果:$199.00