如何在<i class>标记后提取文本?

时间:2019-06-23 06:37:51

标签: python html web-scraping beautifulsoup lxml

我正在尝试使用beautifulSoup从div类中打印出文本'Dealer',但我不知道如何提取它。

我试图打印i类,但文本Dealer没有出来

url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
seller_type_text = articles[4].find('i', class_ = 'icon icon--secondary muted valign--top push-quarter--right icon--user-formal')

print(seller_type.prettify())
print()
print(seller_type_text)

这是我得到的输出:

<div class="item push-quarter--ends listing__spec--dealer">
 <i class="icon icon--secondary muted valign--top push-quarter--right icon--user-formal">
 </i>
 Dealer
 <span class="flyout listing__badge listing__badge--trusted-seller inline--block valign--top push-quarter--left">
  <i class="icon icon--thumb-up">
  </i>
  <span class="flyout__content flyout__content--tip visuallyhidden--portable">
   This 'Trusted Dealer' has a proven track record of upholding the best car selling practices certified by Carlist.my
  </span>
 </span>
 <!-- used car -->
 <!-- BMW -->
</div>


<i class="icon icon--secondary muted valign--top push-quarter--right icon--user-formal"></i>

如何在我上课之后和跨度班之前打印“经销商”一词?

有人可以帮我吗?

非常感谢!

3 个答案:

答案 0 :(得分:1)

有一种使用i标签元素的复合类名称之一和next_sibling的更快方法。

如果您检查html,则可以看到“经销商”是div标签的父i的一部分,并紧随i标签之后;因此,您可以获取i标签,然后使用next_sibling

enter image description here

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://www.carlist.my/used-cars-for-sale/proton/malaysia')
soup = bs(r.content, 'lxml')
print(soup.select_one('.icon--user-formal').next_sibling)

答案 1 :(得分:0)

看看您的Seller_type的contents属性。您会看到Dealer位于Seller_type.contents [2]。换句话说,

import requests
from bs4 import BeautifulSoup
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia?profile_type=Dealer'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
print(seller_type.contents[2])

答案 2 :(得分:0)

List<MenuItem> menu = new List<Model.MenuItem>();

       menu.Add(new Model.MenuItem("hihi", "123123", "1231231", "123123", "123123"));

       menu.Add(new Model.MenuItem("hihi", "123123", "1231231", "123123", "123123"));

       menu.Add(new Model.MenuItem("hihi", "123123", "1231231", "123123", "123123"));

       ListViewMenu.ItemsSource = menu;