使用BeautifulSoup导航DOM树

时间:2019-06-08 21:50:40

标签: python dom beautifulsoup

我正在抓捕一个网站以获取商品价格,并且没有弄清楚如何浏览树状结构。 在最好的情况下,我将有一个for循环来迭代所有li并进行一些数据analisys,因此,我希望有一个迭代器来迭代向下嵌套的特定元素

我试图将嵌套元素称为àla .div.div。我想我还只是新手,我们将不胜感激某些帮助!

uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "lxml")

containers = page_soup.findAll(
    "li", {"class": "mp-Listing mp-Listing--list-item"})

这是树结构:

    <figure class="mp-Listing-image-container"><a
            data-tracking="mucLxVHX8FbvYBHPHfGkOCRq9VFszDlhSxgIClJUJRXbTYMnnOw8kI1NFuitzMperXfQZoyyS2Mx8VbGSZB7_jITV8iJZErGmgWsWp4Arvmpog9Hw3EO8q45U-6chavRHHXbOGPOeNci_683vlir1_SAK-XDa7Znjl22XHOxxH_n3QwloxZSRCxAKGjVYg8aQGTfUgZd2b9DDBdUR2fqyUEUXqnMGZ5hjKlTKTR67obF26tTc8kc1HAsv_fvTEfJW-UxpJCuVhXjKi3pcuL99F8QesdivVy1p_jhs7KL-528jJXZ-LGNSz6cloZlO3yEsAdN_NxI4vz76mTfPY-fiRuAlSPfcjP8KYuDw9e8Qz-QyhUNfhIzOZyU6r1suEfcihY9w_HYY-Qn6vmZ8Bw9ZZn4CEV7odI4_7RzYe8OBw4UmTXAODFxJgS-7fnlWgUAZqX8wu_WydbQLqDqpMXEMsbzKFxaerTLhhUGBqNlBEzpJ0jBIm7-hafuMH5v3IRU0Iha8fUbu7soVLYTuTcbBG2dUgEH-O2-bALjnkMB8XWlICCM14klxeRyOAFscVKg2m6p5aanRR38dgEXuvVE9UcSjHW43JeNSv3gJ7GwJww"
            href="/a/velos-velomoteurs/velos-ancetres-oldtimers/a34926285-peugeot-velo-de-course-1970.html?c=17f70af2bde4a155c6d568ce3cad9ab7&amp;previousPage=lr">
            <div class="mp-Listing-image-item mp-Listing-image-item--main"
                style="background-image:url(//i.ebayimg.com/00/s/NTI1WDcwMA==/z/LlYAAOSw3Rdc-miZ/$_82.JPG)"><img
                    alt="Peugeot - V�lo de course - 1970" data-img-src="Peugeot - V�lo de course - 1970"
                    src="//i.ebayimg.com/00/s/NTI1WDcwMA==/z/LlYAAOSw3Rdc-miZ/$_82.JPG"
                    title="Peugeot - V�lo de course - 1970" /></div>
        </a></figure>
    <div class="mp-Listing-content">
        <div class="mp-Listing-group mp-Listing-group--main">
            <h3 class="mp-Listing-title"><a
                    data-tracking="mucLxVHX8FbvYBHPHfGkOCRq9VFszDlhSxgIClJUJRXbTYMnnOw8kI1NFuitzMperXfQZoyyS2Mx8VbGSZB7_jITV8iJZErGmgWsWp4Arvmpog9Hw3EO8q45U-6chavRHHXbOGPOeNci_683vlir1_SAK-XDa7Znjl22XHOxxH_n3QwloxZSRCxAKGjVYg8aQGTfUgZd2b9DDBdUR2fqyUEUXqnMGZ5hjKlTKTR67obF26tTc8kc1HAsv_fvTEfJW-UxpJCuVhXjKi3pcuL99F8QesdivVy1p_jhs7KL-528jJXZ-LGNSz6cloZlO3yEsAdN_NxI4vz76mTfPY-fiRuAlSPfcjP8KYuDw9e8Qz-QyhUNfhIzOZyU6r1suEfcihY9w_HYY-Qn6vmZ8Bw9ZZn4CEV7odI4_7RzYe8OBw4UmTXAODFxJgS-7fnlWgUAZqX8wu_WydbQLqDqpMXEMsbzKFxaerTLhhUGBqNlBEzpJ0jBIm7-hafuMH5v3IRU0Iha8fUbu7soVLYTuTcbBG2dUgEH-O2-bALjnkMB8XWlICCM14klxeRyOAFscVKg2m6p5aanRR38dgEXuvVE9UcSjHW43JeNSv3gJ7GwJww"
                    href="/a/velos-velomoteurs/velos-ancetres-oldtimers/a34926285-peugeot-velo-de-course-1970.html?c=17f70af2bde4a155c6d568ce3cad9ab7&amp;previousPage=lr">Peugeot
                    - V�lo de course - 1970</a></h3>
            <p class="mp-Listing-description mp-text-paragraph">Cet objet est vendu par Catawiki. Cliquez sur le lien
                pour �tre redirig� vers le site Catawiki et placer votre ench�re.v�lo de cou<span><input
                        class="mp-Listing-show-more" id="a34926285" type="checkbox" /><span
                        class="mp-Listing-description mp-Listing-description--extended">rse peugeot des ann�es 70,
                        �quip� de pneus neufs (michelin dynamic sport), freins Mafac racer, d�railleur allvit, 3
                        plateaux, 21 vitesses.selle Basano</span><label for="a34926285">...<span
                            class="mp-Icon mp-Icon--xs mp-svg-arrow-down"></span><span
                            class="mp-Icon mp-Icon--xs mp-svg-arrow-up"></span></label></span></p>
            <div class="mp-Listing-attributes"></div>
        </div>
        <div class="mp-Listing-group mp-Listing-group--aside">
            <div class="mp-Listing-group mp-Listing-group--top-block"><span
                    class="mp-Listing-price mp-text-price-label">Voir description</span><span
                    class="mp-Listing-seller-name"><a class="mp-TextLink"
                        href="/u/catawiki/38096837/">Catawiki</a></span><span
                    class="mp-Listing-date">Aujourd'hui</span><span class="mp-Listing-location">Toute la
                    Belgique<br /></span></div>
            <div class="mp-Listing-group mp-Listing-group--bottom-block"><span class="mp-Listing-priority">Annonce au
                    top</span><span class="mp-Listing-seller-link"><a class="mp-TextLink undefined"
                        href="https://admarkt.2dehands.be/buyside/url/RK-f5Gyr8TS9VKWPn06TDHk8zCWeSU5-PsQDuvr5tYpoRXQYzjmhI4E8OX9dXcZb0TEQOFSDMueu3s5kqHSihdgWdlYIhSdweDBq0ckhYm7kU8NzKSx7FWvKA8-ZSJUz6PW439SHCTDUa2er4_kqge-fyr8zJemRXzISpFdvVIzVufagipJY-9jozmgnesM_bfBJxR6r0IvKWR8GYnfgv0bPsg1Ny5CQMsw4LsI33lUP_g6cYuGIcGOeEupRpJtf1sXv11G7BTj3gZAo5fvVk35hdfr5LVSJxJYsDUOxS7pdcFtkVO-0EEbZwLG3FlDYaPqLnComuKbmrSwzIW6EwfWXvr1lvifS5cOPflPSsVE319HKQ06w2vk4-4N9-E-cSXye9Yj_YHhNCJdEynvHV0XWkMkdLE_flG421UIIHVbDZdKHV429Ka7HQQSdpbyU6nQ94UsVzRfi2gEgXM18WuI96qkT8oFtqZwGrrE4wlyLuDJnPWkzaYmEwsSoPslrkv_mY66yEOLYsLolpTF3aTRU3sqv0GvZwnPkR04uZJY8GeL70uz3XaP5mYPxKz-pmCFbnJN_i9oiA_LjEIrEzSmvCEM_jViUfPB4FIib7VEi_gag5qWNYYxfkIyT4mC9Y0EKx0JbNHzyBs1062ETCiFvtPaAgconmyqW2ztnw4it_D10qAEemDppNOXKMmX_Jg-feuFKwq-MdIxiyJK3yoiKPXzMEEBa2WXqchDAPF52YmcVjq8HDORqYFkq5-iLumz6Y8ut-smKs_-vMG7k52nO3RW3RzuO0syMLBlZGiqUnADJtj0hmGmzqHXRqflq4QCTEE2vmG2flfMSIz9XJ7ECg73CP5OSNPg5VlzWfCVgd7o1TYd-rFBFXWM5Xz-ZlCA03LOZtP3BeQR3-TnSL6MNWo46vEtHq5ntcF-TrFTl4h01C5DNF_7R4W36CqQ4"
                        rel="noopener noreferrer nofollow" target="_blank">Visiter le site internet</a></span></div>
        </div>
    </div>
</li>

该想法是通过引用获取<span class="mp-Listing-seller-name"><a class="mp-TextLink">。像container.div.span ....

1 个答案:

答案 0 :(得分:0)

我相信这是您想要的:

from bs4 import BeautifulSoup as bs

target = [your code above - note that it's missing the opening <li>]

page_soup = bs(target, "lxml")
containers = page_soup.find_all('li')
for container in containers:
    item = container.find_all("span", class_= "mp-Listing-seller-name")
    print(item)

输出:

[<span class="mp-Listing-seller-name"><a class="mp-TextLink" href="/u/catawiki/38096837/">Catawiki</a></span>]