我试图用beautifulsoup废弃html数据,我想知道如何废弃一个拥有课程的班级。 (双课)。在这里我的html代码要理解:
<span class="phone-contacts">
<span class = "phone">
<span class = "label">
phone
</span>
<span class = "value">
<a class="tel" href="tel:+41XXXXXX">
021 XXX XX XX
</a>
</span>
</span>
<span class = "mobile">
<span class = "label">
Mobile
</span>
<span class = "value">
<a class="tel" href="tel:+41XXXXXX">
079 XXX XX XX
</a>
</span>
</span>
</span>
&#13;
你可以看到,定义手机和手机的最后一个班级是&#34; tel&#34;,这就是我的问题,我想像手机一样在字典中使用手机和手机:
def bot_get_data(item_url):
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
name_company = soup.find_all("h1")
phone_number = soup.find_all("a", {"class": "phone"}, {"class": "tel"})
#my problem is here : I have to find a way to go in a class who owns another
mobile_number = soup.find_all("a",{"class": "mobile"}, {"class": "tel"})
site_name = soup.find_all("a", {"class": "redirect"})
email_name = soup.find_all("a", href=re.compile('mailto'))
name_data = []
phone_data = []
mobile_data = []
site_data = []
mail_data = []
for item in name_company:
name_data.append(item.string)
print(item.string)
for num in phone_number:
phone_data.append(num.string)
print(num.string)
for mob in mobile_number:
mobile_data.append(mob.string)
print(mob.string)
for site in site_name:
site_data.append(site.string)
print(site.string)
for email in email_name:
mail_data.append(email.string)
print(email.string)
是否有人知道如何用beautifulsoup做到这一点?
谢谢=)
答案 0 :(得分:1)
tel = soup.select(".phone .value .tel")[0].text.strip()
mob = soup.select(".mobile .value. .tel")[0].text.strip()