美丽的汤发现没有回报

时间:2019-01-19 22:46:21

标签: python beautifulsoup

找到的返回“无”

这是我尝试过的所有代码以及正在使用的html:

url = "https://www.instagram.com/p/BszEBehhwet/"
a = urlopen(url)
html = a.read()
a.close()
page_soup = soup(html, "html.parser")

found = page_soup.find("div", {"class":"P9YgZ"})
<div class="KlCQn G14m- EtaWk">
    <ul class="k59kT">
        <li class="gElp9 " role="menuitem">
            <div class="P9YgZ">
                <div class="C7I1f X7jCj">
                    <div class="C4VMK">
                        <h2 class="_6lAjh">
                            <a class="FPmhX notranslate TlrDj" 
                            title="ray.walker00" 
                            href="/ray.walker00/">ray.walker00
                            </a>
                        </h2>
                        <span>Jan. 18, 2019 // Awesome
</span>
</div>
</div>
</div>
</li>
</ul>
</div>

我想返回div类P9YgZ

1 个答案:

答案 0 :(得分:1)

正如我在评论中指出的那样,您正在使用的页面非常依赖javascript,以至于urllib本身并不会削减它。这是一个利用Selenium WebDriver获取该类元素的示例。您将需要下载ChromeDriver并修改代码以使其指向系统上的位置:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def main():

    options = Options()
    options.add_argument("--headless")

    driver = webdriver.Chrome(
        options=options, executable_path="C:\chromedriver\chromedriver.exe"
    )

    try:
        driver.get("https://www.instagram.com/p/BszEBehhwet/")

        soup = BeautifulSoup(driver.page_source, "html.parser")
        print(soup.find("div", {"class": "P9YgZ"}))

    finally:
        driver.quit()


if __name__ == "__main__":
    main()

结果:

<div class="P9YgZ"><div class="C7I1f X7jCj"><div class="C4VMK"><h2 class="_6lAjh"><a class="FPmhX notranslate TlrDj" href="/thetremason/" title="thetremason">thetremason</a></h2><span>How I’m finna pull up to ya function.</span></div></div></div>