我正在抓捕一个网站以获取商品价格,并且没有弄清楚如何浏览树状结构。
在最好的情况下,我将有一个for
循环来迭代所有li
并进行一些数据analisys,因此,我希望有一个迭代器来迭代向下嵌套的特定元素
我试图将嵌套元素称为àla .div.div
。我想我还只是新手,我们将不胜感激某些帮助!
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")
containers = page_soup.findAll(
"li", {"class": "mp-Listing mp-Listing--list-item"})
这是树结构:
<figure class="mp-Listing-image-container"><a
data-tracking="mucLxVHX8FbvYBHPHfGkOCRq9VFszDlhSxgIClJUJRXbTYMnnOw8kI1NFuitzMperXfQZoyyS2Mx8VbGSZB7_jITV8iJZErGmgWsWp4Arvmpog9Hw3EO8q45U-6chavRHHXbOGPOeNci_683vlir1_SAK-XDa7Znjl22XHOxxH_n3QwloxZSRCxAKGjVYg8aQGTfUgZd2b9DDBdUR2fqyUEUXqnMGZ5hjKlTKTR67obF26tTc8kc1HAsv_fvTEfJW-UxpJCuVhXjKi3pcuL99F8QesdivVy1p_jhs7KL-528jJXZ-LGNSz6cloZlO3yEsAdN_NxI4vz76mTfPY-fiRuAlSPfcjP8KYuDw9e8Qz-QyhUNfhIzOZyU6r1suEfcihY9w_HYY-Qn6vmZ8Bw9ZZn4CEV7odI4_7RzYe8OBw4UmTXAODFxJgS-7fnlWgUAZqX8wu_WydbQLqDqpMXEMsbzKFxaerTLhhUGBqNlBEzpJ0jBIm7-hafuMH5v3IRU0Iha8fUbu7soVLYTuTcbBG2dUgEH-O2-bALjnkMB8XWlICCM14klxeRyOAFscVKg2m6p5aanRR38dgEXuvVE9UcSjHW43JeNSv3gJ7GwJww"
href="/a/velos-velomoteurs/velos-ancetres-oldtimers/a34926285-peugeot-velo-de-course-1970.html?c=17f70af2bde4a155c6d568ce3cad9ab7&previousPage=lr">
<div class="mp-Listing-image-item mp-Listing-image-item--main"
style="background-image:url(//i.ebayimg.com/00/s/NTI1WDcwMA==/z/LlYAAOSw3Rdc-miZ/$_82.JPG)"><img
alt="Peugeot - V�lo de course - 1970" data-img-src="Peugeot - V�lo de course - 1970"
src="//i.ebayimg.com/00/s/NTI1WDcwMA==/z/LlYAAOSw3Rdc-miZ/$_82.JPG"
title="Peugeot - V�lo de course - 1970" /></div>
</a></figure>
<div class="mp-Listing-content">
<div class="mp-Listing-group mp-Listing-group--main">
<h3 class="mp-Listing-title"><a
data-tracking="mucLxVHX8FbvYBHPHfGkOCRq9VFszDlhSxgIClJUJRXbTYMnnOw8kI1NFuitzMperXfQZoyyS2Mx8VbGSZB7_jITV8iJZErGmgWsWp4Arvmpog9Hw3EO8q45U-6chavRHHXbOGPOeNci_683vlir1_SAK-XDa7Znjl22XHOxxH_n3QwloxZSRCxAKGjVYg8aQGTfUgZd2b9DDBdUR2fqyUEUXqnMGZ5hjKlTKTR67obF26tTc8kc1HAsv_fvTEfJW-UxpJCuVhXjKi3pcuL99F8QesdivVy1p_jhs7KL-528jJXZ-LGNSz6cloZlO3yEsAdN_NxI4vz76mTfPY-fiRuAlSPfcjP8KYuDw9e8Qz-QyhUNfhIzOZyU6r1suEfcihY9w_HYY-Qn6vmZ8Bw9ZZn4CEV7odI4_7RzYe8OBw4UmTXAODFxJgS-7fnlWgUAZqX8wu_WydbQLqDqpMXEMsbzKFxaerTLhhUGBqNlBEzpJ0jBIm7-hafuMH5v3IRU0Iha8fUbu7soVLYTuTcbBG2dUgEH-O2-bALjnkMB8XWlICCM14klxeRyOAFscVKg2m6p5aanRR38dgEXuvVE9UcSjHW43JeNSv3gJ7GwJww"
href="/a/velos-velomoteurs/velos-ancetres-oldtimers/a34926285-peugeot-velo-de-course-1970.html?c=17f70af2bde4a155c6d568ce3cad9ab7&previousPage=lr">Peugeot
- V�lo de course - 1970</a></h3>
<p class="mp-Listing-description mp-text-paragraph">Cet objet est vendu par Catawiki. Cliquez sur le lien
pour �tre redirig� vers le site Catawiki et placer votre ench�re.v�lo de cou<span><input
class="mp-Listing-show-more" id="a34926285" type="checkbox" /><span
class="mp-Listing-description mp-Listing-description--extended">rse peugeot des ann�es 70,
�quip� de pneus neufs (michelin dynamic sport), freins Mafac racer, d�railleur allvit, 3
plateaux, 21 vitesses.selle Basano</span><label for="a34926285">...<span
class="mp-Icon mp-Icon--xs mp-svg-arrow-down"></span><span
class="mp-Icon mp-Icon--xs mp-svg-arrow-up"></span></label></span></p>
<div class="mp-Listing-attributes"></div>
</div>
<div class="mp-Listing-group mp-Listing-group--aside">
<div class="mp-Listing-group mp-Listing-group--top-block"><span
class="mp-Listing-price mp-text-price-label">Voir description</span><span
class="mp-Listing-seller-name"><a class="mp-TextLink"
href="/u/catawiki/38096837/">Catawiki</a></span><span
class="mp-Listing-date">Aujourd'hui</span><span class="mp-Listing-location">Toute la
Belgique<br /></span></div>
<div class="mp-Listing-group mp-Listing-group--bottom-block"><span class="mp-Listing-priority">Annonce au
top</span><span class="mp-Listing-seller-link"><a class="mp-TextLink undefined"
href="https://admarkt.2dehands.be/buyside/url/RK-f5Gyr8TS9VKWPn06TDHk8zCWeSU5-PsQDuvr5tYpoRXQYzjmhI4E8OX9dXcZb0TEQOFSDMueu3s5kqHSihdgWdlYIhSdweDBq0ckhYm7kU8NzKSx7FWvKA8-ZSJUz6PW439SHCTDUa2er4_kqge-fyr8zJemRXzISpFdvVIzVufagipJY-9jozmgnesM_bfBJxR6r0IvKWR8GYnfgv0bPsg1Ny5CQMsw4LsI33lUP_g6cYuGIcGOeEupRpJtf1sXv11G7BTj3gZAo5fvVk35hdfr5LVSJxJYsDUOxS7pdcFtkVO-0EEbZwLG3FlDYaPqLnComuKbmrSwzIW6EwfWXvr1lvifS5cOPflPSsVE319HKQ06w2vk4-4N9-E-cSXye9Yj_YHhNCJdEynvHV0XWkMkdLE_flG421UIIHVbDZdKHV429Ka7HQQSdpbyU6nQ94UsVzRfi2gEgXM18WuI96qkT8oFtqZwGrrE4wlyLuDJnPWkzaYmEwsSoPslrkv_mY66yEOLYsLolpTF3aTRU3sqv0GvZwnPkR04uZJY8GeL70uz3XaP5mYPxKz-pmCFbnJN_i9oiA_LjEIrEzSmvCEM_jViUfPB4FIib7VEi_gag5qWNYYxfkIyT4mC9Y0EKx0JbNHzyBs1062ETCiFvtPaAgconmyqW2ztnw4it_D10qAEemDppNOXKMmX_Jg-feuFKwq-MdIxiyJK3yoiKPXzMEEBa2WXqchDAPF52YmcVjq8HDORqYFkq5-iLumz6Y8ut-smKs_-vMG7k52nO3RW3RzuO0syMLBlZGiqUnADJtj0hmGmzqHXRqflq4QCTEE2vmG2flfMSIz9XJ7ECg73CP5OSNPg5VlzWfCVgd7o1TYd-rFBFXWM5Xz-ZlCA03LOZtP3BeQR3-TnSL6MNWo46vEtHq5ntcF-TrFTl4h01C5DNF_7R4W36CqQ4"
rel="noopener noreferrer nofollow" target="_blank">Visiter le site internet</a></span></div>
</div>
</div>
</li>
该想法是通过引用获取<span
class="mp-Listing-seller-name"><a class="mp-TextLink">
。像container.div.span ....
答案 0 :(得分:0)
我相信这是您想要的:
from bs4 import BeautifulSoup as bs
target = [your code above - note that it's missing the opening <li>]
page_soup = bs(target, "lxml")
containers = page_soup.find_all('li')
for container in containers:
item = container.find_all("span", class_= "mp-Listing-seller-name")
print(item)
输出:
[<span class="mp-Listing-seller-name"><a class="mp-TextLink" href="/u/catawiki/38096837/">Catawiki</a></span>]