我正在尝试从动态网站上抓取表格(我相信它每10秒更新一次信息)并将其加载到熊猫数据框,但我似乎无法通过获取第一列的第一步。有人可以建议我做错了什么吗?谢谢。
# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
urlpage = 'https://new.cryptoxscanner.com/binance/live'
driver = webdriver.Chrome(executable_path=r"C:\Users\xxxxx\Desktop\chrome\chromedriver.exe")
driver.get(urlpage)
time.sleep(10)
ticker = driver.find_element_by_xpath('//*[@id="scroll-source-1"]/table/tbody/tr[2]')
答案 0 :(得分:0)
首先,您需要等待直到找到数据为止,然后使用.visibility_of_all_elements_located
。您可以使用此定位器等待:
//table[contains(@class, "table-sm")]//a
找到所有数据之后,您可以提取表数据。尝试以下代码:
driver.get('https://new.cryptoxscanner.com/binance/live')
#UPDATED HERE
option = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//select[contains(., "All")]'))))
option.select_by_visible_text('All')
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[contains(@class, "table-sm")]//a')))
data = driver.find_element_by_class_name('table-responsive')
print(data.text)
正在导入:
#UPDATED HERE
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC