我能够使用Selenium / Python拼凑一个程序,该程序打开一个网页,登录并搜索特定零件。
我的最终目标是循环浏览一系列零件号,然后返回价格,打印到列表中。
目前,我正在尝试提取一部分价格数据,但不确定如何操作。 在我在此站点中搜索时,返回了很多零件。没有零件具有与价格关联的属性。我在如何隔离所需零件方面遇到麻烦。
我专门提出了联合国密码。这个非私有的UN / PW是:
userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"
网站为Tessco.com
我假设第一个挑战是从返回的列表中找到我需要的零件。我知道我可以使用以下语法查找项目:
driver.find_element(By.ID, "someID").get_attribute("attribute")
但是,如果该商品没有属性,该如何提取其数据?有一些提取字符串的方法吗?
我当时正在考虑调用IF
函数,该函数说明是否有“ MFG PART#:” ==“有问题的字符串”,在这种情况下为HL4RP-50,然后打印价格。
如果我能够隔离出该零件,该如何提取价格?
Tessco网站的HTML代码为:
<div class="row">
<div class="col-xs-5 col-sm-2 col-md-2 productImage">
<a href="/product/1-2-plenum-air-cable-off-white-574840" class="CoveoResultLink" onclick="ClickToProductDetailGA({name: "1/2\" Plenum Air Cable, Off White", sku: 574840, price: "$1.89", brand: "CommScope", category: "Cable Products| Coaxial Cable, Connectors & Jumpers| Air Coaxial Cable| 1/2\" Air Cable", position: 0, pageType: "Search Page", url: "/product/574840" });" tabindex="0">
<img src="https://avalanche.tessco.com/productimages/250x250/1462639.jpg" alt="CommScope">
</a>
<a class="hidden-xs" href="/search#f:manufacturerName=[CommScope]">
CommScope
</a>
<span class="badge blueBadge">GSA</span>
</div>
<div class="col-xs-7 visible-xs detailMobile">
<a href="/product/1-2-plenum-air-cable-off-white-574840" class="productName CoveoResultLink" onclick="ClickToProductDetailGA({name: "1/2\" Plenum Air Cable, Off White", sku: 574840, price: "$1.89", brand: "CommScope", category: "Cable Products| Coaxial Cable, Connectors & Jumpers| Air Coaxial Cable| 1/2\" Air Cable", position: 0, pageType: "Search Page", url: "/product/574840" });" tabindex="0">1/2" Plenum Air Cable, Off White</a>
</div>
<div class="col-xs-12 col-sm-6 col-md-7 detail">
<div>
<a href="/product/1-2-plenum-air-cable-off-white-574840" class="productName CoveoResultLink hidden-xs" onclick="ClickToProductDetailGA({name: "1/2\" Plenum Air Cable, Off White", sku: 574840, price: "$1.89", brand: "CommScope", category: "Cable Products| Coaxial Cable, Connectors & Jumpers| Air Coaxial Cable| 1/2\" Air Cable", position: 0, pageType: "Search Page", url: "/product/574840" });" tabindex="0">1/2" Plenum Air Cable, Off White</a>
<div class="row">
<div class="col-xs-12">
<ul class="unlisted info">
<li><span>TESSCO SKU:</span> 574840</li>
<li><span>QTY:</span> 1 FOOT</li>
<li><span>UPC:</span> 888063388620</li>
<li><span>MFG PART #:</span> HL4RPV-50</li>
</ul>
</div>
</div>
<p class="more">ANDREW 1/2" Plenum Air 50 ohm cable. HL4RPV-50. Uses LDF4 connectors. Off…</p>
</div>
</div>
<div class="col-xs-12 col-sm-4 col-md-3 purchase">
<div>
<add-product-to-cart params="
sku: 574840,
price: "$1.89",
listPrice: "$6.37",
canPurchase: "true",
isAuthenticated: true,
name: "1/2\" Plenum Air Cable, Off White",
brand: "CommScope",
category: "Cable Products| Coaxial Cable, Connectors & Jumpers| Air Coaxial Cable| 1/2\" Air Cable",
pageType: "Search Page",
brandProtectionLink:"/brand-protection-program",
viewProductPricingText: "viewAccountPricingOnTCOM",
userRoles: "canBuy, authorizedBuyerOnTCOM, viewAccountAvailabilityOnTCOM, viewAccountPricingOnTCOM, viewOrderHistoryOnTCOM, overrideShiptoAddressOnTCOM",
minQuantity:1,
minQuantityBefore: "Minimum ",
minQuantityAfter: " to Order",
isOnSale: "No",
saleClass:"redBadge",
saleText:"Sale",
isCutCable: "true",
maximumReelLength: 2000,
isCableJumper: false,
isPricingWrapperAlive: true,
context: "search",
index: 0, index: 0" data-sellingrestrictioncode="N/A"><div class="price" data-bind="visible: ((canPurchase()===true) && (isAuthenticated()===true)), css: {sale: isOnSale} ">
<span data-bind="text: 'List: ' + listPrice()">List: $6.37</span>
<span data-bind="visible: isOnSale, css:saleClass, text: saleText" class="badge large redBadge" style="display: none;">Sale</span><!--
--><!--ko text: canViewPricing()===true ? price : listPrice-->$1.89<!--/ko-->
</div>
到目前为止,我的Selenium代码是:
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.tessco.com/login")
userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait = WebDriverWait(driver, 10)
elem = wait.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
#Expand the search bar
searchIcon = wait.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i")))
searchIcon.click()
searchBar = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input')))
searchBar.click()
#load in manufacture part number from a collection of components, via an Excel file
#Enter information into the search bar
searchBar.send_keys("HL4RPV-50" + '\n')
任何指针将不胜感激。
答案 0 :(得分:1)
您需要以某种方式获取该价格的路径,然后获取外部html代码,使用in子字符串来获取所需的代码,即该外部html中的值。
price_element = driver.find_element_by_xpath('#xpath of the price here')
price_html = price_element.get_attribute('outerHTML')
price_html = price_html[#substring here]
由于您要抓取的网站要求登录才能查看价格,因此很难复制或显示。
但是我希望这能给您一个想法。祝你好运:D。
答案 1 :(得分:1)
这是您需要的逻辑。
# wait for the products information loaded
products = WebDriverWait(driver,30).until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
# create a dictionary to store product and price
productInfo = {}
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product name
productName = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
# get price
price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
# add details to dictionary
productInfo[productName] = price
# print products information
print(productInfo)
以下是输出:
{'1/2“充气电缆,灰白色':'$ 6.37','1/2'充气电缆,蓝色':'$ 6.37','4.3-10公头,用于1/2英寸AL4RPV-50 ,LDF4-50A,HL4RPV-50':'$ 25.91','4.3-10M RA for 1/2“ AL4RPV-50,LDF4-50A,HL4RPV-50':'$ 51.28','4.3-10 Male1 / 1 / 2英寸Plenum电缆':'$ 34.32','4.3-10母连接器用于1/2“ Plenum':'$ 35.00','4.3-10 R / A公连接器用于1/2” Plenum':'$ 47.50',' 4.3-10女,用于1/2 in AL4RPV-50,LDF4-50A':'$ 25.91'}