使用BeautifulSoup从范围中提取数据

时间:2018-12-13 13:55:47

标签: python python-3.x beautifulsoup

我正在尝试使用BeautifulSoup从两种不同的方式提取跨度中的数据

import requests
import bs4

url ='https://www.futbin.com/19/player/477/Jordan%20Henderson/'
page = requests.get(url).content
soup = bs4.BeautifulSoup(page, 'lxml')



price1 = soup.find("div", {"class": "bin_price lbin"}).span.contents
price2 = soup.select('#ps-lowest-1')


print(price1)
print(price2)

它给了我两个结果

[u'\n', <span id="ps-lowest-1">-</span>, u'\n']
[<span id="ps-lowest-1">-</span>]
[Finished in 1.0s]

现在我想从该范围中提取数据(价格),但我无法 谢谢您的帮助。

3 个答案:

答案 0 :(得分:1)

page变量中获得的HTML中没有实际价格。 价格是通过浏览器中的单独请求动态加载的

您也可以在代码中模拟该请求:

from pprint import pprint
import requests

url ='https://www.futbin.com/19/playerPrices?player=183711'
page = requests.get(url).json()

pprint(page)

会打印:

{u'183711': {u'prices': {u'pc': {u'LCPrice': u'1,500',
                                 u'LCPrice2': u'1,500',
                                 u'LCPrice3': u'1,500',
                                 u'LCPrice4': u'1,500',
                                 u'LCPrice5': u'1,500',
                                 u'MaxPrice': u'10,000',
                                 u'MinPrice': u'700',
                                 u'PRP': u'8',
                                 u'updated': u'49 mins ago'},
                         u'ps': {u'LCPrice': u'1,300',
                                 u'LCPrice2': u'1,300',
                                 u'LCPrice3': u'1,300',
                                 u'LCPrice4': u'1,300',
                                 u'LCPrice5': u'1,300',
                                 u'MaxPrice': u'10,000',
                                 u'MinPrice': u'700',
                                 u'PRP': u'6',
                                 u'updated': u'25 mins ago'},
                         u'xbox': {u'LCPrice': u'1,500',
                                   u'LCPrice2': u'1,500',
                                   u'LCPrice3': u'1,600',
                                   u'LCPrice4': u'1,600',
                                   u'LCPrice5': u'1,600',
                                   u'MaxPrice': u'10,000',
                                   u'MinPrice': u'700',
                                   u'PRP': u'8',
                                   u'updated': u'30 mins ago'}}}}

答案 1 :(得分:1)

您想要的数据来自XHR或Ajax,首先您需要提取ID,然后将其用于获取JSON内容。

import requests
from bs4 import BeautifulSoup

url ='https://www.futbin.com/19/player/477/Jordan%20Henderson/'
page = requests.get(url).text
soup = BeautifulSoup(page, 'html.parser')

playerId = soup.find(id="page-info")['data-baseid'] # 183711

jsonURL = url ='https://www.futbin.com/19/playerPrices?player=' + playerId
jsonObj = requests.get(url).json()
# print(jsonObj)

psLowestPrice = jsonObj[playerId]['prices']['ps']['LCPrice']
print(psLowestPrice)

答案 2 :(得分:0)

bs4 select为您提供匹配标签的列表。
按照您的示例,该怎么做:

price1 = soup.find("div", {"class": "bin_price lbin"}).span.contents
price2 = soup.select('#ps-lowest-1')

访问列表中第一个元素内的文本:

print(price2[0].text)

或全部选中:

for elem in price2:
  print(elem.text)