BS4 + html,b标签问题

时间:2019-12-20 00:47:38

标签: python python-3.x beautifulsoup

这个问题是关于使用bs4进行网页抓取

这是我编写的代码:

import requests
from bs4 import BeautifulSoup
import json
import csv

page = requests.get('https://www.alibaba.com/product-detail/Portable-Small-USB-Travel-LED-Makeup_60830030133.html?spm=a2700.details.maylikever.2.1fb53cc2uSVPvx')

# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')

#extract product score **(This is what I want to extract)**
stars = soup.select_one('a[class="score-lite"]', namespaces=None, flags=0)
#score = json.loads(stars)
print('Stars', stars)

我的结果:

<a class="score-lite" data-spm-click="gostr=/details.index.reviewLevel;locaid=dreviewLevel" href="https://onuliss.en.alibaba.com/company_profile/feedback.html" target="_blank"><b>4.8 </b><img src="//img.alicdn.com/tfs/TB1MJPmiQL0gK0jSZFtXXXQCXXa-8-9.svg"/></a>

我想要的结果只是'b'标签之间的4.8数字 = soup.select_one()函数有什么用?

非常感谢:)

3 个答案:

答案 0 :(得分:1)

尝试使用更具体的选择器,匹配项的string属性和strip()来消除最终多余的空格。

import requests
from bs4 import BeautifulSoup
import json
import csv

page = requests.get('https://www.alibaba.com/product-detail/Portable-Small-USB-Travel-LED-Makeup_60830030133.html?spm=a2700.details.maylikever.2.1fb53cc2uSVPvx')

# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')

#extract product score **(This is what I want to extract)**
stars = soup.select_one('a[class="score-lite"] > b', namespaces=None, flags=0).get_text(strip=True)
#score = json.loads(stars)
print('Stars', stars)
  

星星4.8

答案 1 :(得分:0)

关于SimplifiedDoc

import requests
from simplified_scrapy.simplified_doc import SimplifiedDoc 
page = requests.get('https://www.alibaba.com/product-detail/Portable-Small-USB-Travel-LED-Makeup_60830030133.html?spm=a2700.details.maylikever.2.1fb53cc2uSVPvx')
# Create a SimplifiedDoc object
doc = SimplifiedDoc(page.text)
# get element use tag and class
stars = doc.getElement('a','class',"score-lite")
print('Stars', stars.text, stars.b.text) # Stars 4.8 4.8

答案 2 :(得分:0)

import requests
from bs4 import BeautifulSoup


r = requests.get(
    'https://www.alibaba.com/product-detail/Portable-Small-USB-Travel-LED-Makeup_60830030133.html?spm=a2700.details.maylikever.2.1fb53cc2uSVPvx')

soup = BeautifulSoup(r.text, 'html.parser')

if r.status_code == 200:
    item = soup.find('a', {'class': 'score-lite'}).find('b')
    print(item.get_text(strip=True))

输出:

4.8