我正在尝试获取网站https://www.tradingview.com/symbols/BTCUSD/technicals/中元素的xpath 具体是汇总车速表下的结果。无论是买入还是卖出。
使用Google Chrome xpath我得到了结果
//*[@id="technicals-root"]/div/div/div[2]/div[2]/span[2]
并尝试在python中获取该数据,我将其插入
from lxml import html
import requests
page = requests.get('https://www.tradingview.com/symbols/BTCUSD/technicals/')
tree = html.fromstring(page.content)
status = tree.xpath('//*[@id="technicals-root"]/div/div/div[2]/div[2]/span[2]/text()')
当我打印状态时,我得到一个空数组。但似乎xpath没有任何问题。我已经读过谷歌做了一些恶意写的HTML表格的恶作剧,它会输出错误的xpath,但这似乎不是问题。
感谢先进的任何帮助
答案 0 :(得分:0)
当我运行您的代码时,“technicals-root”div为空。我假设javascript正在填写它。当你无法静态获取页面时,你总是可以转向Selenium来运行浏览器并让它解决所有问题。您可能需要调整驱动程序路径才能使其在您的环境中运行,但这对我有用:
import time
import contextlib
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
option = webdriver.ChromeOptions()
option.add_argument(" — incognito")
with contextlib.closing(webdriver.Chrome(
executable_path='/usr/lib/chromium-browser/chromedriver',
chrome_options=option)) as browser:
browser.get('https://www.tradingview.com/symbols/BTCUSD/technicals/')
# wait until js has filled in the element - and a bit longer for js churn
WebDriverWait(browser, 20).until(EC.visibility_of_element_located(
(By.XPATH,
'//*[@id="technicals-root"]/div/div/div[2]/div[2]/span')))
time.sleep(1)
status = browser.find_elements_by_xpath(
'//*[@id="technicals-root"]/div/div/div[2]/div[2]/span[2]')
print(status[0].text)