使用BeautifulSoup和Selenium缺少元素

时间:2018-10-19 17:02:17

标签: python selenium web-scraping beautifulsoup

我正试图从FB市场列表中获取一些信息。我可以满足我所需的一半,但是某些列表项没有显示。

这是由javascript生成的,因此我正在使用Selenium和无头的chrome驱动程序。即使使用WebDriverWait来确保它已加载。

不确定从这里要去哪里。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options  
from selenium.webdriver.common.keys import Keys

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException


CHROMEDRIVER_PATH = 'G:\WPy-3661\python-3.6.6.amd64\Scripts\chromedriver.exe'
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)

url = "https://www.facebook.com/marketplace/item/1894930974147050"
driver.get(url)
html = driver.page_source

try:
    WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, '_2pid')))
except TimeoutException:
    print('Page timed out after 10 secs.')

soup = BeautifulSoup(html, "lxml")
car_info=soup.find("div", {"class":"_2pid"})
print(car_info)

结果:

<div class="_2pid">
    <ul class="uiList _4kg _6-i _6-h _6-j">
        <li>
            <ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
                <li class="_6c33"></li>
                <li><span
                        style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Driven 51,000 miles</span>
                </li>
            </ul>
        </li>
        <li>
            <ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
                <li class="_41ri"></li>
                <li><span
                        style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Automatic transmission</span>
                </li>
            </ul>
        </li>
    </ul>
</div>

预期:

<div class="_2pid">
    <ul class="uiList  _4ki _509-">
        <li>
            <div><span
                    style="font-family: Arial, sans-serif; font-size: 14px; line-height: 18px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">How This Price Compares</span>
            </div>
            <div><a target="_blank"
                    href="https://l.facebook.com/l.php?u=https%3A%2F%2Fwww.kbb.com%2F&amp;h=AT3_mCt8Kfo6wpXMZA11ocxcyOtEzcinSeOKsl1dQR174GFYDZPwWaDtQWWaJVU4aR5QCyfFXKjuNDNnE7ZsBQ2CTfsSOJZFRbCRf94-oiOl_VriQZcPPtNOMUZ1pSoa1PyohA"
                    rel="nofollow noopener" data-lynx-mode="hover">Kelley Blue Book® </a><span
                    style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(96, 103, 112);">Private Party Value</span>
            </div>
        </li>
        <li class="_idm"><span data-hover="tooltip"><span class="_1y50"></span></span></li>
    </ul>
    <div class="_6cbr">
        <div class="_6cbi">
            <div class="_6cbj" style="left: -3px; background: rgb(175, 179, 185);">$14,100<span class="_6cbk"
                                                                                                style="left: -25.507px; background: rgb(175, 179, 185);"></span>
            </div>
            <span class="_6cbn"><span class="_6cbo">Low</span></span><span class="_6cbs"></span><span
                class="_6cbt"><span class="_6cbu">High</span></span></div>
        <h4 class="_6cbv">$21,213 – $23,102</h4></div>
    <div><span
            style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Price range for similar Honda Pilot in excellent condition · <a
            target="_blank"
            href="https://www.facebook.com/business/help/1545716818882941">How It's Calculated</a></span></div>
    <div class="_1lil _3-8y _3-96"></div>
    <ul class="uiList  _4ki _509-">
        <li><span class="_3-96"
                  style="font-family: Arial, sans-serif; font-size: 14px; line-height: 18px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33); display: inline-block;">About This Vehicle</span>
        </li>
        <li class="_idm"><span data-hover="tooltip"><span class="_1y50"></span></span></li>
    </ul>
    <ul class="uiList _4kg _6-i _6-h _6-j">
        <li>
            <ul class="_6c32 uiList  _4ki _509- _6-i _6-h _6-j">
                <li class="_6c33"></li>
                <li><span
                        style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Driven 51,000 miles</span>
                </li>
            </ul>
        </li>
        <li>
            <ul class="_6c32 uiList  _4ki _509- _6-i _6-h _6-j">
                <li class="_41ri"></li>
                <li><span
                        style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Automatic transmission</span>
                </li>
            </ul>
        </li>
        <li>
            <ul class="_6c32 uiList  _4ki _509- _6-i _6-h _6-j">
                <li class="_6c37"></li>
                <li><span
                        style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">17 MPG city</span>
                    ·
                    <span style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">24 MPG highway</span>
                </li>
            </ul>
        </li>
        <li>
            <ul class="_6c32 uiList  _4ki _509- _6-i _6-h _6-j">
                <li class="_6c1g"></li>
                <li><a target="_blank" href="https://www.facebook.com/business/help/1545716818882941">Excellent
                    condition</a></li>
            </ul>
        </li>
    </ul>
    <div class="_1lil _3-8y _3-96"></div>
</div>

问题可能是由于该信息位于可滚动页面的小节中。是否需要滚动以使用javascript生成源代码?如果是这种情况,则不确定如何执行。

0 个答案:

没有答案