我想在检查元素时抓取CSS选择器的所有#text部分。我似乎正在抓住选择器下的所有数字,而不是文本部分。
链接即时抓取为https://www.virginmobile.ca/en/phones/phone-details.html#!/gs9/Grey/64/TR20。
我想在“选择您的电话价格”下获取价格,但字符串末尾不要包含“ $”和“ 99”美分
目前,我只熟悉获取整个String。
driver.get(link)
time.sleep(3)
print('--------------------------- begining ------------------')
planTypeUpfrontCostListRaw = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#phonePricesList .ultra')))
for element in planTypeUpfrontCostListRaw:
upfrontCost = element.text
print(upfrontCost)
print('--------------------------- END ------------------------')
答案 0 :(得分:1)
解决方案1
代替使用text
,而使用innerHTML
。这将为您返回该元素的html代码,包括文本!
例如,它将返回您:
"<sup>$</sup>199<sup>99</sup>"
然后,您可以使用正则表达式库re
仅在中间获取值。
print(re.search('\d+', upfrontCost).group(0))
输出:199
这是执行此操作的代码:
from selenium.webdriver import Chrome
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import re
link = "https://www.virginmobile.ca/en/phones/phone-details.html#!/gs9/Grey/64/TR20"
driver = Chrome()
wait = WebDriverWait(driver, 15)
driver.get(link)
print('--------------------------- begining ------------------')
planTypeUpfrontCostListRaw = wait.until \
(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.price.ultra.ng-binding.ng-scope')))
for element in planTypeUpfrontCostListRaw:
upfrontCost = element.get_attribute('innerHTML')
upfrontCost = re.search('\d+', upfrontCost).group(0)
print(upfrontCost)
print('--------------------------- END ------------------------')
输出:
--------------------------- begining ------------------
0
0
199
349
739
1019
--------------------------- END ------------------------
解决方案2
您仍然可以使用text
并使用$ strip
删除不需要的数据,并删除最后两位。
driver = Chrome()
wait = WebDriverWait(driver, 15)
driver.get(link)
print('--------------------------- begining ------------------')
planTypeUpfrontCostListRaw = wait.until \
(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.price.ultra.ng-binding.ng-scope')))
for element in planTypeUpfrontCostListRaw:
upfrontCost = element.text.strip('$')
if upfrontCost != '0':
upfrontCost = upfrontCost[:-2]
print(upfrontCost)
print('--------------------------- END ------------------------')
答案 1 :(得分:0)
您可以转储到bs4中并使用stripped_strings
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
d = webdriver.Chrome(r'C:\Users\User\Documents\chromedriver.exe')
d.get('https://www.virginmobile.ca/en/phones/phone-details.html?province=ON&geoResult=failed#!/gs9/Grey/64/TR20')
WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "planlevels .price")))
soup = bs(d.page_source, 'lxml')
plans = soup.select('planlevels .price')
for plan in plans:
price = [string for string in plan.stripped_strings][1]
print(price)
IMO的Uglier可能会使用split而不使用BS4
plans = WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "planlevels .price")))
for plan in plans:
print(plan.get_attribute('innerHTML').split('</sup>')[1].split('<sup>')[0])