content> div> div.tickerBar.overflown> div> span.instruments.tickerBarSection> span：nth-child（1）> span.price

Question

我想使用bs4在Bitmex中解析价格信息。

（网站网址为“ https://www.bitmex.com/app/trade/XBTUSD”）

所以，我写下了这样的代码

from bs4 import BeautifulSoup
import requests

url = 'https://www.bitmex.com/app/trade/XBTUSD'
bitmex = requests.get(url)

if bitmex.status_code == 200:
    print("connected...")
else:
    print("Error...")

bitmex_html = bitmex.text
soup = BeautifulSoup(bitmex_html , 'lxml' )
price = soup.find_all("span", {"class": "price"})
print(price)

结果是这样的

connected...
[]

为什么弹出“ []”？并且要带上价格文字，例如“ 6065.5”，我该怎么办？我要解析的文本是

<span class="price">6065.5</span>

选择器是

content> div> div.tickerBar.overflown> div> span.instruments.tickerBarSection> span：nth-child（1）> span.price

我只是学习Python，所以问题似乎很奇怪……抱歉

Answer 1

您非常接近。尝试以下操作，看看它是否还满足您的需求。也许您看到或检索的格式与您期望的不太一样。希望这会有所帮助。

from bs4 import BeautifulSoup
import requests
import sys
import json

url = 'https://www.bitmex.com/app/trade/XBTUSD'
bitmex = requests.get(url)

if bitmex.status_code == 200:
    print("connected...")
else:
    print("Error...")
    sys.exit(1)

bitmex_html = bitmex.text
soup = BeautifulSoup(bitmex_html , 'lxml' )

# extract the json text from the returned page
price = soup.find_all("script", {"id": "initialData"})
price = price.pop()

# parse json text
d = json.loads(price.text)

# pull out the order book and then each price listed in the order book
order_book = d['orderBook']
prices = [v['price'] for v in order_book]
print(prices)

示例输出：

connected...
[6045, 6044.5, 6044, 6043.5, 6043, 6042.5, 6042, 6041.5, 6041, 6040.5, 6040, 6039.5, 6039, 6038.5, 6038, 6037.5, 6037, 6036.5, 6036, 6035.5, 6035, 6034.5, 6034, 6033.5, 6033, 6032.5, 6032, 6031.5, 6031, 6030.5, 6030, 6029.5, 6029, 6028.5, 6028, 6027.5, 6027, 6026.5, 6026, 6025.5, 6025, 6024.5, 6024, 6023.5, 6023, 6022.5, 6022, 6021.5, 6021, 6020.5]

Answer 2

您的问题是页面首先没有包含那些TransactionId元素。如果您在浏览器开发人员工具中选中span标签（在firefox中按F12键），您会看到页面由response标签组成，并带有一些用JavaScript编写的代码，这些代码在执行时会动态创建元素。

由于BeautifulSoup无法执行Javascript，因此您无法直接使用Javascript提取元素。您有两种选择：

使用类似script的名称，使您可以从python驱动浏览器-这意味着将执行javascript，因为您使用的是真正的浏览器-但性能会受到影响。
阅读JavaScript代码，理解并编写python代码进行仿真。通常这比较难，但是对您来说幸运的是，这对于您想要的页面来说似乎很简单：
```
selenium
```

如您所见，页面内的数据为json格式。加载数据变量后，您可以使用它访问所需的信息：

import requests
import lxml.html

r = requests.get('https://www.bitmex.com/app/trade/XBTUSD')
doc = lxml.html.fromstring(r.text)
data = json.loads(doc.xpath("//script[@id='initialData']/text()")[0])

将打印：

for row in data['orderBook']:
    print(row['symbol'], row['price'], row['side'])

虽然我使用bs4解析网站

content> div> div.tickerBar.overflown> div> span.instruments.tickerBarSection> span：nth-child（1）> span.price

2 个答案:

虽然我使用bs4解析网站

content> div> div.tickerBar.overflown> div> span.instruments.tickerBarSection> span：nth-​​child（1）> span.price

2 个答案:

content> div> div.tickerBar.overflown> div> span.instruments.tickerBarSection> span：nth-child（1）> span.price