python3-如何从范围中抓取数据

时间:2018-11-07 21:38:36

标签: python-3.x web-scraping beautifulsoup

我尝试使用python3和BeautifulSoup。

import requests
import json
from bs4 import BeautifulSoup

url = "https://www.binance.com/pl"

#get the data
data = requests.get(url);

soup = BeautifulSoup(data.text,'lxml')

print(soup)

如果我打开html代码(在浏览器中),则可以看到: html code in browser

但是在我的数据(在控制台中打印)中,我看不到btc价格: what data i cant see in console

您能给我一些建议如何删除这些数据吗?

1 个答案:

答案 0 :(得分:1)

使用.findAll()查找所有行,然后可以使用它查找给定行中的所有单元格。您必须查看页面的结构。这不是标准行,而是使一系列div看起来像一张表。因此,您必须查看每个div的role才能获得所需的数据。

我假设您要查看特定的行,因此我的示例使用 Para 列查找那些行。由于星星位于它自己的小单元格中,因此 Para 列是第二个单元格,即索引1。这样,这就是您要导出哪些单元格的问题。

如果您想要获得所有东西,则可以取出过滤器。您还可以对其进行修改,以查看单元格的值是否高于某个价格点。

# Import necessary libraries
import requests
from bs4 import BeautifulSoup
# Ignore the insecure warning
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# Set options and which rows you want to look at
url = "https://www.binance.com/pl"
desired_rows = ['ADA/BTC', 'ADX/BTC']

# Get the page and convert it into beautiful soup
response = requests.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')

# Find all table rows
rows = soup.findAll('div', {'role':'row'})

# Process all the rows in the table
for row in rows:
    try:
        # Get the cells for the given row
        cells = row.findAll('div', {'role':'gridcell'})
        # Convert them to just the values of the cell, ignoring attributes
        cell_values = [c.text for c in cells]

        # see if the row is one you want
        if cell_values[1] in desired_rows:
            # Output the data however you'd like
            print(cell_values[1], cell_values[-1])

    except IndexError: # there was a row without cells
        pass

这将导致以下输出:

ADA/BTC 1,646.39204255
ADX/BTC 35.29384873