Python,Selenium:从字符串中拉价

时间:2019-07-05 18:59:28

标签: python python-3.x selenium selenium-webdriver selenium-chromedriver

我能够使用Selenium / Python拼凑一个程序,该程序打开一个网页,登录并搜索特定零件。

我的最终目标是循环浏览一系列零件号,然后返回价格,打印到列表中。

目前,我正在尝试提取一部分价格数据,但不确定如何操作。 在我在此站点中搜索时,返回了很多零件。没有零件具有与价格关联的属性。我在如何隔离所需零件方面遇到麻烦。

我专门提出了联合国密码。这个非私有的UN / PW是:

userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"

网站为Tessco.com

我假设第一个挑战是从返回的列表中找到我需要的零件。我知道我可以使用以下语法查找项目:

driver.find_element(By.ID, "someID").get_attribute("attribute")

但是,如果该商品没有属性,该如何提取其数据?有一些提取字符串的方法吗?

我当时正在考虑调用IF函数,该函数说明是否有“ MFG PART#:” ==“有问题的字符串”,在这种情况下为HL4RP-50,然后打印价格。

如果我能够隔离出该零件,该如何提取价格?

Tessco网站的HTML代码为:


    <div class="row">
                                                <div class="col-xs-5 col-sm-2 col-md-2 productImage">
                                                    <a href="/product/1-2-plenum-air-cable-off-white-574840" class="CoveoResultLink" onclick="ClickToProductDetailGA({name: &quot;1/2\&quot; Plenum Air Cable, Off White&quot;, sku: 574840, price: &quot;$1.89&quot;, brand: &quot;CommScope&quot;, category: &quot;Cable Products| Coaxial Cable, Connectors &amp; Jumpers| Air Coaxial Cable| 1/2\&quot; Air Cable&quot;, position: 0, pageType: &quot;Search Page&quot;, url: &quot;/product/574840&quot; });" tabindex="0">
                                                        <img src="https://avalanche.tessco.com/productimages/250x250/1462639.jpg" alt="CommScope">
                                                    </a>
                                                    <a class="hidden-xs" href="/search#f:manufacturerName=[CommScope]">
                                                        CommScope
                                                    </a>

                                                    <span class="badge blueBadge">GSA</span>

                                                </div>
                                                <div class="col-xs-7 visible-xs detailMobile">
                                                    <a href="/product/1-2-plenum-air-cable-off-white-574840" class="productName CoveoResultLink" onclick="ClickToProductDetailGA({name: &quot;1/2\&quot; Plenum Air Cable, Off White&quot;, sku: 574840, price: &quot;$1.89&quot;, brand: &quot;CommScope&quot;, category: &quot;Cable Products| Coaxial Cable, Connectors &amp; Jumpers| Air Coaxial Cable| 1/2\&quot; Air Cable&quot;, position: 0, pageType: &quot;Search Page&quot;, url: &quot;/product/574840&quot; });" tabindex="0">1/2" Plenum Air Cable, Off White</a>
                                                </div>
                                                <div class="col-xs-12 col-sm-6 col-md-7 detail">
                                                    <div>
                                                        <a href="/product/1-2-plenum-air-cable-off-white-574840" class="productName CoveoResultLink hidden-xs" onclick="ClickToProductDetailGA({name: &quot;1/2\&quot; Plenum Air Cable, Off White&quot;, sku: 574840, price: &quot;$1.89&quot;, brand: &quot;CommScope&quot;, category: &quot;Cable Products| Coaxial Cable, Connectors &amp; Jumpers| Air Coaxial Cable| 1/2\&quot; Air Cable&quot;, position: 0, pageType: &quot;Search Page&quot;, url: &quot;/product/574840&quot; });" tabindex="0">1/2" Plenum Air Cable, Off White</a>
                                                        <div class="row">
                                                            <div class="col-xs-12">
                                                                <ul class="unlisted info">
                                                                    <li><span>TESSCO SKU:</span> 574840</li>
                                                                    <li><span>QTY:</span> 1 FOOT</li>
                                                                    <li><span>UPC:</span> 888063388620</li>
                                                                    <li><span>MFG PART #:</span> HL4RPV-50</li>
                                                                </ul>
                                                            </div>
                                                        </div>
                                                        <p class="more">ANDREW 1/2" Plenum Air 50 ohm cable. HL4RPV-50. Uses LDF4 connectors. Off…</p>
                                                    </div>
                                                </div>
                                                <div class="col-xs-12 col-sm-4 col-md-3 purchase">
                                                    <div>
                                                        <add-product-to-cart params="
                                                                             sku: 574840,
                                                                             price: &quot;$1.89&quot;,
                                                                             listPrice: &quot;$6.37&quot;,
                                                                             canPurchase: &quot;true&quot;,
                                                                             isAuthenticated: true,
                                                                             name: &quot;1/2\&quot; Plenum Air Cable, Off White&quot;,
                                                                             brand: &quot;CommScope&quot;,
                                                                             category: &quot;Cable Products| Coaxial Cable, Connectors &amp; Jumpers| Air Coaxial Cable| 1/2\&quot; Air Cable&quot;,
                                                                             pageType: &quot;Search Page&quot;,
                                                                             brandProtectionLink:&quot;/brand-protection-program&quot;,
                                                                             viewProductPricingText: &quot;viewAccountPricingOnTCOM&quot;,
                                                                             userRoles: &quot;canBuy, authorizedBuyerOnTCOM, viewAccountAvailabilityOnTCOM, viewAccountPricingOnTCOM, viewOrderHistoryOnTCOM, overrideShiptoAddressOnTCOM&quot;,
                                                                             minQuantity:1,
                                                                             minQuantityBefore: &quot;Minimum &quot;,
                                                                             minQuantityAfter: &quot; to Order&quot;,
                                                                             isOnSale: &quot;No&quot;,
                                                                             saleClass:&quot;redBadge&quot;,
                                                                             saleText:&quot;Sale&quot;,
                                                                             isCutCable: &quot;true&quot;,
                                                                             maximumReelLength: 2000,
                                                                             isCableJumper: false,
                                                                             isPricingWrapperAlive: true,
                                                                             context: &quot;search&quot;,
                                                                             index: 0, index: 0" data-sellingrestrictioncode="N/A"><div class="price" data-bind="visible: ((canPurchase()===true) &amp;&amp; (isAuthenticated()===true)), css: {sale: isOnSale} ">
        <span data-bind="text: 'List: ' + listPrice()">List: $6.37</span>
        <span data-bind="visible: isOnSale, css:saleClass, text: saleText" class="badge large redBadge" style="display: none;">Sale</span><!--
        --><!--ko text: canViewPricing()===true ? price : listPrice-->$1.89<!--/ko-->
    </div>

到目前为止,我的Selenium代码是:

    import time
    #Need Selenium for interacting with web elements
    from selenium import webdriver
    from selenium.webdriver.support import expected_conditions as EC
    #Need numpy/pandas to interact with large datasets
    import numpy as np
    import pandas as pd

    chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
    driver = webdriver.Chrome(chrome_path)
    driver.get("https://www.tessco.com/login")

    userName = "FirstName.SurName321123@gmail.com"
    password = "PasswordForThis123"

    #Set a wait, for elements to load into the DOM
    wait = WebDriverWait(driver, 10)

    elem = wait.until(EC.element_to_be_clickable((By.ID, "userID"))) 
    elem.send_keys(userName)

    elem = wait.until(EC.element_to_be_clickable((By.ID, "password"))) 
    elem.send_keys(password)

    #Press the login button
    driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()

    #Expand the search bar
    searchIcon = wait.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i"))) 
    searchIcon.click()

    searchBar = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input'))) 
    searchBar.click()

    #load in manufacture part number from a collection of components, via an Excel file

    #Enter information into the search bar
    searchBar.send_keys("HL4RPV-50" + '\n')

任何指针将不胜感激。

2 个答案:

答案 0 :(得分:1)

您需要以某种方式获取该价格的路径,然后获取外部html代码,使用in子字符串来获取所需的代码,即该外部html中的值。

price_element = driver.find_element_by_xpath('#xpath of the price here')
price_html = price_element.get_attribute('outerHTML')
price_html = price_html[#substring here]

由于您要抓取的网站要求登录才能查看价格,因此很难复制或显示。

但是我希望这能给您一个想法。祝你好运:D。

答案 1 :(得分:1)

这是您需要的逻辑。

# wait for the products information loaded
products = WebDriverWait(driver,30).until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
# create a dictionary to store product and price
productInfo = {}
# iterate through all products in the search result and add details to dictionary
for product in products:
    # get product name
    productName = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
    # get price
    price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
    # add details to dictionary
    productInfo[productName] = price
# print products information   
print(productInfo)

以下是输出:

  

{'1/2“充气电缆,灰白色':'$ 6.37','1/2'充气电缆,蓝色':'$ 6.37','4.3-10公头,用于1/2英寸AL4RPV-50 ,LDF4-50A,HL4RPV-50':'$ 25.91','4.3-10M RA for 1/2“ AL4RPV-50,LDF4-50A,HL4RPV-50':'$ 51.28','4.3-10 Male1 / 1 / 2英寸Plenum电缆':'$ 34.32','4.3-10母连接器用于1/2“ Plenum':'$ 35.00','4.3-10 R / A公连接器用于1/2” Plenum':'$ 47.50',' 4.3-10女,用于1/2 in AL4RPV-50,LDF4-50A':'$ 25.91'}