在动态表上使用硒进行网络抓取

时间:2020-02-15 17:15:30

标签: python selenium web-scraping

我正在尝试从动态网站上抓取表格(我相信它每10秒更新一次信息)并将其加载到熊猫数据框,但我似乎无法通过获取第一列的第一步。有人可以建议我做错了什么吗?谢谢。

# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd

urlpage = 'https://new.cryptoxscanner.com/binance/live'

driver = webdriver.Chrome(executable_path=r"C:\Users\xxxxx\Desktop\chrome\chromedriver.exe")

driver.get(urlpage)
time.sleep(10)
ticker = driver.find_element_by_xpath('//*[@id="scroll-source-1"]/table/tbody/tr[2]')

1 个答案:

答案 0 :(得分:0)

首先,您需要等待直到找到数据为止,然后使用.visibility_of_all_elements_located。您可以使用此定位器等待:

//table[contains(@class, "table-sm")]//a

找到所有数据之后,您可以提取表数据。尝试以下代码:

driver.get('https://new.cryptoxscanner.com/binance/live')

#UPDATED HERE
option = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//select[contains(., "All")]'))))
option.select_by_visible_text('All')

WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[contains(@class, "table-sm")]//a')))
data = driver.find_element_by_class_name('table-responsive')
print(data.text)

正在导入:

#UPDATED HERE
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC