BeautifulSoup并未提取页面上的所有元素

时间:2019-01-17 19:59:24

标签: python python-3.x beautifulsoup python-requests

当我运行python脚本时,它不会清除网页上的所有元素

我在这里搜索了所有帖子,但似乎没有任何效果。 我尝试使用urllib,html5lib和硒

import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver

def render_page(url):
    driver = webdriver.Firefox()
    driver.get(url)
    time.sleep(10)
    r = driver.page_source
    #driver.quit()
    return r

myUrl = 'https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description=graphic+cards&N=-1&isNodeId=1'

r = render_page(myUrl)

soup = BeautifulSoup(r, "html.parser")

containers = soup.findAll("div",{"class":"item-container"})
container = containers[0]

container

这就是我应该放在容器中的内容[0] ...

div class="item-container  ">
<!--product image-->
<a href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814137291&amp;Description=graphic%20cards&amp;cm_re=graphic_cards-_-14-137-291-_-Product" class="item-img">


    <div class="item-badges">

    </div>



    <img src="https://c1.neweggimages.com/NeweggImage/ProductImageCompressAll300/14-137-291-Z01.jpg?ex=2" title="MSI Radeon RX 570 DirectX 12 RX 570 ARMOR MK2 8G OC 8GB 256-Bit GDDR5 PCI Express x16 HDCP Ready CrossFireX Support Video Card" alt="MSI Radeon RX 570 DirectX 12 RX 570 ARMOR MK2 8G OC 8GB 256-Bit GDDR5 PCI Express x16 HDCP Ready CrossFireX Support Video Card" is-retina="true" class="hoverZoomLink" width="240" height="180">
</a>
<div class="item-info">
    <!--brand info-->
    <div class="item-branding">

        <a href="https://www.newegg.com/MSI/BrandStore/ID-1312" class="item-brand">


            <img src="//c1.neweggimages.com/Brandimage_70x28//Brand1312.gif" title="MSI" alt="MSI">
        </a>

        <!--rating info-->

        <a title="Rating + 5" href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814137291&amp;Description=graphic%20cards&amp;SortField=0&amp;SummaryType=0&amp;PageSize=10&amp;SelectedRating=-1&amp;VideoOnlyMark=False&amp;ignorebbr=1&amp;IsFeedbackTab=true#scrollFullInfo" class="item-rating"><i class="rating rating-5"></i><span class="item-rating-num">(51)</span></a>

    </div>
    <!--description info-->
    <a href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814137291&amp;Description=graphic%20cards&amp;cm_re=graphic_cards-_-14-137-291-_-Product" class="item-title" title="View Details"><i class="icon-premier icon-premier-xsm"></i>MSI Radeon RX 570 DirectX 12 RX 570 ARMOR MK2 8G OC 8GB 256-Bit GDDR5 PCI Express x16 HDCP Ready CrossFireX Support Video Card</a>
    <!--promption info-->
    <p class="item-promo"><i class="item-promo-icon"></i>Get 2 Free Games w/ purchase, limited offer</p>
    <!--feature-->
    <ul class="item-features">
        <li><strong>DisplayPort:</strong> 2 x DisplayPort</li>
        <li><strong>DVI:</strong> 1 x DL-DVI-D</li>
        <li><strong>HDMI:</strong> 2 x HDMI</li>
        <li><strong>Card Dimensions (L x H):</strong> 10.63" x 5.12"</li>

        <li><strong>Model #: </strong>RX 570 ARMOR MK2 8G OC</li>


        <li><strong>Item #: </strong>N82E16814137291</li>


        <li><strong>Return Policy: </strong><a href="https://kb.newegg.com/Article/Index/12/3?id=1167#80" target="_blank" title="Replacement Only Return Policy(New Window)">Replacement Only Return Policy</a></li>


    </ul>
    <div class="item-action">
        <!--price-->


        <ul class="price   has-label-membership ">
            <li class="price-was">

            </li>
            <li class="price-map">


            </li>
            <li class="price-current">

                <span class="price-current-label">

                    <a class="membership-info  membership-popup" name="membership" style="display: inline" data-neg-popid="MembershipPopup" href="javascript:void(0);" aria-label="Premier Price Explaination"><span class="membership-icon"></span><span style="display: none">|</span></a>
                </span>$<strong>189</strong><sup>.99</sup>&nbsp;<a href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814137291&amp;buyingoptions=New&amp;Description=graphic%20cards" class="price-current-num">(10 Offers)</a>
                <span class="price-current-range">
                    <abbr title="to">–</abbr>
                </span>

            </li>
            <li class="price-save ">

                <span class="price-save-endtime price-save-endtime-current"></span>
                <span class="price-save-endtime price-save-endtime-another" style="display:none;"></span>


            </li>
            <li class="price-note">

                <span class="price-note-dollar" data-price="$174.99">$174.99</span>
                <span class="price-note-label "> after </span>
                <span class="price-note-dollar">$15.00</span>
                <span class="price-note-label"> rebate card</span>


            </li>
            <li class="price-ship">
                Free Shipping
            </li>
        </ul>

        <!--egg point-->

        <!--financing-->


        <!--button-->
        <div class="item-operate  ">
            <div class="item-button-area">

                <button type="button" title="View Details" class="btn  btn-mini " onclick="Javascript:Biz.ProductList.Item.add('https://www.newegg.com/Product/Product.aspx?Item=N82E16814137291&amp;Description=graphic%20cards');">View Details <i class="fa fa-caret-right"></i></button>



            </div>

            <!--compare-->
            <div class="item-compare-box">
                <label class="form-checkbox">
                    <input id="CompareItem_14-137-291" autocomplete="off" neg-itemnumber="14-137-291" type="checkbox" name="CompareItem" value="CompareItem_14-137-291">
                    <span class="form-checkbox-title">Compare</span>
                </label>
            </div>
            <script type="text/javascript">
                Biz.Product.CompareConfig.compareItems.push("14-137-291");
                var itemThumbs = new Object();
                itemThumbs.itemNumber = "14-137-291";
                itemThumbs.imageUrl = "//c1.neweggimages.com/ProductImageCompressAll35/14-137-291-Z01.jpg";
                Biz.Product.CompareConfig.Thumbs.push(itemThumbs);
            </script>

        </div>
    </div>
</div>

这是我真正得到的...

<div class="item-container" data-itemnumber="35-103-060">
<a class="item-img" href="https://www.newegg.com/Product/Product.aspx?Item=35-103-060&amp;cm_sp=SearchSuccess-_-INFOCARD-_-graphic+cards-_-35-103-060-_-1&amp;Description=graphic+cards" onclick="Javascript:s_search_results_clickthrough(this);s_search_results_clickthrough(this);s_search_results_clickthrough(this);s_search_results_clickthrough(this);">
    <img alt="Cooler Master SickleFlow 120 - Sleeve Bearing 120mm Blue LED Silent Fan for Computer Cases, CPU Coolers, and Radiators" height="62" is-retina="true" src="https://c1.neweggimages.com/ProductImageCompressAll300/35-103-060-17.jpg?ex=2" title="Cooler Master SickleFlow 120 - Sleeve Bearing 120mm Blue LED Silent Fan for Computer Cases, CPU Coolers, and Radiators" width="83" />
</a>
<div class="item-info">
    <div class="item-branding">
        <a class="item-rating" href="https://www.newegg.com/Product/Product.aspx?Item=35-103-060&amp;cm_sp=SearchSuccess-_-INFOCARD-_-graphic+cards-_-35-103-060-_-1&amp;Description=graphic+cards&amp;IsFeedbackTab=true#scrollFullInfo" onclick="Javascript:s_search_results_clickthrough(this);s_search_results_clickthrough(this);s_search_results_clickthrough(this);s_search_results_clickthrough(this);"><i class="rating rating-4"></i><span class="item-rating-num">(2476)</span></a>
    </div>
    <a class="item-title" href="https://www.newegg.com/Product/Product.aspx?Item=35-103-060&amp;cm_sp=SearchSuccess-_-INFOCARD-_-graphic+cards-_-35-103-060-_-1&amp;Description=graphic+cards" onclick="Javascript:s_search_results_clickthrough(this);s_search_results_clickthrough(this);s_search_results_clickthrough(this);s_search_results_clickthrough(this);">
        <i class="icon-premier icon-premier-xsm"></i>

        Cooler Master SickleFlow 120 - Sleeve Bearing 120mm Blue LED Silen...
    </a>
</div>

所有尝试都得到相同的结果。我的最终目标是编写container.div.div.a.img [“ title”]并获得制造商MSI。预先感谢您的帮助

1 个答案:

答案 0 :(得分:1)

您的搜索条件需要更具体,因为在主项目网格之外还有元素类别为item-container的元素。将您的选择限制为.is-grid的后代:

import requests
from bs4 import BeautifulSoup

url = 'https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description=graphic+cards&N=-1&isNodeId=1'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

for item in soup.select('.is-grid .item-container'):
    print(item.select_one('.item-brand img')['title'])

结果:

MSI
GIGABYTE
ZOTAC
...