Python-BeautifulSoup:如何从变量中提取特定文本?

时间:2018-06-08 19:14:17

标签: python beautifulsoup

我需要从面包屑中提取第三级别类别" Toys&嗜好" ?

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.aliexpress.com/item/Mini-remote-control-quadcopter-WiFi-FPV-camera-hd-Pocket-Selfie-Drone-JY018-Easy-carry-travel-hd/32807755975.html?spm=2114.search0104.3.32.60a361b94HcXvC&ws_ab_test=searchweb0_0,searchweb201602_2_10152_5722813_10151_10065_10344_10068_5722613_10342_5722913_10343_10340_10341_10696_10084_10083_5722713_10618_10307_10059_100031_10103_10624_10623_10622_10621_10620_5722513-10620,searchweb201603_2,ppcSwitch_5&algo_expid=c18e9af6-7eae-465d-8f23-20bf577602e3-3&algo_pvid=c18e9af6-7eae-465d-8f23-20bf577602e3&transAbTest=ae803_1&priceBeautifyAB=0'  
uClient = uReq(my_url)  
page_html = uClient.read()  
uClient.close()  
page_soup = soup(page_html, "html.parser")  
description = page_soup.findAll("div", {"class": "ui-box-body"})  
print(description)
string4 = str(description)

1 个答案:

答案 0 :(得分:1)

好吧,假设您只是在寻找面包屑,这不是您班级建议的地方,这是一个可能的解决方案。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.aliexpress.com/item/Mini-remote-control-quadcopter-WiFi-FPV-camera-hd-Pocket-Selfie-Drone-JY018-Easy-carry-travel-hd/32807755975.html?spm=2114.search0104.3.32.60a361b94HcXvC&ws_ab_test=searchweb0_0,searchweb201602_2_10152_5722813_10151_10065_10344_10068_5722613_10342_5722913_10343_10340_10341_10696_10084_10083_5722713_10618_10307_10059_100031_10103_10624_10623_10622_10621_10620_5722513-10620,searchweb201603_2,ppcSwitch_5&algo_expid=c18e9af6-7eae-465d-8f23-20bf577602e3-3&algo_pvid=c18e9af6-7eae-465d-8f23-20bf577602e3&transAbTest=ae803_1&priceBeautifyAB=0'  
uClient = uReq(my_url)  
page_html = uClient.read()  
uClient.close()  
page_soup = soup(page_html, "html.parser")  
description = page_soup.find("div", {"class": "ui-breadcrumb"})

for key, link in enumerate(description.findAll('a')):
    if key == 2:
        print(link.text)

请注意,这会输出页面上唯一的“玩具与爱好”,关键词。