我正试图从FB市场列表中获取一些信息。我可以满足我所需的一半,但是某些列表项没有显示。
这是由javascript生成的,因此我正在使用Selenium和无头的chrome驱动程序。即使使用WebDriverWait来确保它已加载。
不确定从这里要去哪里。
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
CHROMEDRIVER_PATH = 'G:\WPy-3661\python-3.6.6.amd64\Scripts\chromedriver.exe'
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
url = "https://www.facebook.com/marketplace/item/1894930974147050"
driver.get(url)
html = driver.page_source
try:
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, '_2pid')))
except TimeoutException:
print('Page timed out after 10 secs.')
soup = BeautifulSoup(html, "lxml")
car_info=soup.find("div", {"class":"_2pid"})
print(car_info)
结果:
<div class="_2pid">
<ul class="uiList _4kg _6-i _6-h _6-j">
<li>
<ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
<li class="_6c33"></li>
<li><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Driven 51,000 miles</span>
</li>
</ul>
</li>
<li>
<ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
<li class="_41ri"></li>
<li><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Automatic transmission</span>
</li>
</ul>
</li>
</ul>
</div>
预期:
<div class="_2pid">
<ul class="uiList _4ki _509-">
<li>
<div><span
style="font-family: Arial, sans-serif; font-size: 14px; line-height: 18px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">How This Price Compares</span>
</div>
<div><a target="_blank"
href="https://l.facebook.com/l.php?u=https%3A%2F%2Fwww.kbb.com%2F&h=AT3_mCt8Kfo6wpXMZA11ocxcyOtEzcinSeOKsl1dQR174GFYDZPwWaDtQWWaJVU4aR5QCyfFXKjuNDNnE7ZsBQ2CTfsSOJZFRbCRf94-oiOl_VriQZcPPtNOMUZ1pSoa1PyohA"
rel="nofollow noopener" data-lynx-mode="hover">Kelley Blue Book® </a><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(96, 103, 112);">Private Party Value</span>
</div>
</li>
<li class="_idm"><span data-hover="tooltip"><span class="_1y50"></span></span></li>
</ul>
<div class="_6cbr">
<div class="_6cbi">
<div class="_6cbj" style="left: -3px; background: rgb(175, 179, 185);">$14,100<span class="_6cbk"
style="left: -25.507px; background: rgb(175, 179, 185);"></span>
</div>
<span class="_6cbn"><span class="_6cbo">Low</span></span><span class="_6cbs"></span><span
class="_6cbt"><span class="_6cbu">High</span></span></div>
<h4 class="_6cbv">$21,213 – $23,102</h4></div>
<div><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Price range for similar Honda Pilot in excellent condition · <a
target="_blank"
href="https://www.facebook.com/business/help/1545716818882941">How It's Calculated</a></span></div>
<div class="_1lil _3-8y _3-96"></div>
<ul class="uiList _4ki _509-">
<li><span class="_3-96"
style="font-family: Arial, sans-serif; font-size: 14px; line-height: 18px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33); display: inline-block;">About This Vehicle</span>
</li>
<li class="_idm"><span data-hover="tooltip"><span class="_1y50"></span></span></li>
</ul>
<ul class="uiList _4kg _6-i _6-h _6-j">
<li>
<ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
<li class="_6c33"></li>
<li><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Driven 51,000 miles</span>
</li>
</ul>
</li>
<li>
<ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
<li class="_41ri"></li>
<li><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">Automatic transmission</span>
</li>
</ul>
</li>
<li>
<ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
<li class="_6c37"></li>
<li><span
style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">17 MPG city</span>
·
<span style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);">24 MPG highway</span>
</li>
</ul>
</li>
<li>
<ul class="_6c32 uiList _4ki _509- _6-i _6-h _6-j">
<li class="_6c1g"></li>
<li><a target="_blank" href="https://www.facebook.com/business/help/1545716818882941">Excellent
condition</a></li>
</ul>
</li>
</ul>
<div class="_1lil _3-8y _3-96"></div>
</div>
问题可能是由于该信息位于可滚动页面的小节中。是否需要滚动以使用javascript生成源代码?如果是这种情况,则不确定如何执行。