Question

我想将页面上的表格刮入数据框，其列名称为“合同”和“资金比率”。（https://www.binance.com/en/futures/funding-history/1）

这是我到目前为止尝试过的，但仍然无法解决。感谢有人能帮助我解决这个问题。

import time

import pandas as pd

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()

options.headless = True
driver = webdriver.Chrome(options=options)

driver.get("https://www.binance.com/cn/futures/funding-history/0")
time.sleep(5)
# headers = driver.find_elements_by_xpath('//*[@class="tablesorter-headerRow"][2]/th/div')
table = driver.find_element_by_xpath('//*[@id="bnc-table-tbody"]')

但是它给了我一些错误。

Answer 1

xPath选择器应为：//*[@class="bnc-table-tbody"]

HTML：

然后，您可以遍历表行并将其转换为DataFrame：

table = driver.find_element_by_xpath('//*[@class="bnc-table-tbody"]')
data = []
for tr in table.find_elements_by_xpath('tr'):
    columns = tr.find_elements_by_xpath('td')
    data.append({
        'Contract': columns[0].text,
        'Funding Rate': columns[2].text
    })
# Convert lits of dictionaries into a dataframe
df = pd.DataFrame(data)

Answer 2

观看开发人员工具（Ctrl + Shift + I）中的网络活动，似乎该网页正在向API发出POST个请求，该API返回了一个JSON数据，用于填充表格。这意味着您不需要selenium，requests可以单独处理任务。我们可以通过一次电话会议获得所有数据：

import requests
import pandas as pd

API_URI = ' https://www.binance.com/gateway-api/v1/public/future/common/get-funding-rate-history'

with requests.Session() as session:
    session.headers.update({'User-Agent':'Just Another Human'})
    payload = {'symbol': "BTCUSDT", 'page': 1, 'rows': 1276} # There are 1276 rows
    
    response = session.post(API_URI, json=payload)
   

data = response.json()
df = pd.DataFrame(data['data'])

NetWorK

Answer 3

使用循环访问tr td元素

(//div[@class='bnc-table-wrapper']//table//tbody/tr)[index]//td//text()

#this will give the text of td elements try this an loop tr elements

想用硒刮一张页面上的表格

3 个答案: