我尝试使用python3和BeautifulSoup。
import requests
import json
from bs4 import BeautifulSoup
url = "https://www.binance.com/pl"
#get the data
data = requests.get(url);
soup = BeautifulSoup(data.text,'lxml')
print(soup)
如果我打开html代码(在浏览器中),则可以看到: html code in browser
但是在我的数据(在控制台中打印)中,我看不到btc价格: what data i cant see in console
您能给我一些建议如何删除这些数据吗?
答案 0 :(得分:1)
使用.findAll()
查找所有行,然后可以使用它查找给定行中的所有单元格。您必须查看页面的结构。这不是标准行,而是使一系列div
看起来像一张表。因此,您必须查看每个div的role
才能获得所需的数据。
我假设您要查看特定的行,因此我的示例使用 Para 列查找那些行。由于星星位于它自己的小单元格中,因此 Para 列是第二个单元格,即索引1。这样,这就是您要导出哪些单元格的问题。
如果您想要获得所有东西,则可以取出过滤器。您还可以对其进行修改,以查看单元格的值是否高于某个价格点。
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
# Ignore the insecure warning
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
# Set options and which rows you want to look at
url = "https://www.binance.com/pl"
desired_rows = ['ADA/BTC', 'ADX/BTC']
# Get the page and convert it into beautiful soup
response = requests.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')
# Find all table rows
rows = soup.findAll('div', {'role':'row'})
# Process all the rows in the table
for row in rows:
try:
# Get the cells for the given row
cells = row.findAll('div', {'role':'gridcell'})
# Convert them to just the values of the cell, ignoring attributes
cell_values = [c.text for c in cells]
# see if the row is one you want
if cell_values[1] in desired_rows:
# Output the data however you'd like
print(cell_values[1], cell_values[-1])
except IndexError: # there was a row without cells
pass
这将导致以下输出:
ADA/BTC 1,646.39204255
ADX/BTC 35.29384873