我尝试使用BeautifulSoup从网站上获取产品尺寸,但却被困在这里。我只需要获得文本:
S, M, L, XL, XXL, XXXL, 4XL, 5XL
代码:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
myurl = 'https://www.aliexpress.com/item/Vfemage-Womens-Elegant-Ruched-Bow-Contrast-Patchwork-3-4-Sleeve-Vintage-Pinup-Work-Office-Party-Fitted/32831085887.html?spm=2114.search0103.3.12.iQlXqu&ws_ab_test=searchweb0_0,searchweb201602_3_10152_10065_10151_10344_10068_10345_10342_10325_10343_51102_10546_10340_10548_10341_10609_10541_10084_10083_10307_10610_10539_10312_10313_10059_10314_10534_100031_10604_10603_10103_10605_10594_10142_10107,searchweb201603_25,ppcSwitch_5&algo_expid=a3e03a67-d922-4c90-aba7-d3cc80101a75-1&algo_pvid=a3e03a67-d922-4c90-aba7-d3cc80101a75&rmStoreLevelAB=0'
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
size = page_soup.findAll("ul",{"id":"j-sku-list-2"})
print(size)
它返回:
[
<ul class="sku-attr-list util-clearfix" data-sku-prop-id="5" data-sku-show-type="none" id="j-sku-list-2">
<li><a data-role="sku" data-sku-id="100014064" href="javascript:void(0)" id="sku-2-100014064"><span>S</span></a></li>
<li><a data-role="sku" data-sku-id="361386" href="javascript:void(0)" id="sku-2-361386"><span>M</span></a></li>
<li><a data-role="sku" data-sku-id="361385" href="javascript:void(0)" id="sku-2-361385"><span>L</span></a></li>
<li><a data-role="sku" data-sku-id="100014065" href="javascript:void(0)" id="sku-2-100014065"><span>XL</span></a></li>
<li><a data-role="sku" data-sku-id="4182" href="javascript:void(0)" id="sku-2-4182"><span>XXL</span></a></li>
<li><a data-role="sku" data-sku-id="4183" href="javascript:void(0)" id="sku-2-4183"><span>XXXL</span></a></li>
<li><a data-role="sku" data-sku-id="200000990" href="javascript:void(0)" id="sku-2-200000990"><span>4XL</span></a></li>
<li><a data-role="sku" data-sku-id="200000991" href="javascript:void(0)" id="sku-2-200000991"><span>5XL</span></a></li>
</ul>]
答案 0 :(得分:0)
您需要进一步了解ul
寻找li
元素,为每个元素调用get_text()
:
sizes = page_soup.find("ul", {"id":"j-sku-list-2"}).find_all("li")
print([size.get_text(strip=True) for size in sizes])
# prints ['S', 'M', 'L', 'XL', 'XXL', 'XXXL', '4XL', '5XL']
或者,使用CSS selector:
以更简洁的方式sizes = page_soup.select("ul#j-sku-list-2 li")