Question

我正在尝试获取网站https://www.tradingview.com/symbols/BTCUSD/technicals/中元素的xpath 具体是汇总车速表下的结果。无论是买入还是卖出。

使用Google Chrome xpath我得到了结果

//*[@id="technicals-root"]/div/div/div[2]/div[2]/span[2]

并尝试在python中获取该数据，我将其插入

from lxml import html
import requests

page = requests.get('https://www.tradingview.com/symbols/BTCUSD/technicals/')
tree = html.fromstring(page.content)
status = tree.xpath('//*[@id="technicals-root"]/div/div/div[2]/div[2]/span[2]/text()')

当我打印状态时，我得到一个空数组。但似乎xpath没有任何问题。我已经读过谷歌做了一些恶意写的HTML表格的恶作剧，它会输出错误的xpath，但这似乎不是问题。

感谢先进的任何帮助

Answer 1

当我运行您的代码时，“technicals-root”div为空。我假设javascript正在填写它。当你无法静态获取页面时，你总是可以转向Selenium来运行浏览器并让它解决所有问题。您可能需要调整驱动程序路径才能使其在您的环境中运行，但这对我有用：

import time
import contextlib
import selenium
from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException

option = webdriver.ChromeOptions()
option.add_argument(" — incognito")

with contextlib.closing(webdriver.Chrome(
        executable_path='/usr/lib/chromium-browser/chromedriver', 
        chrome_options=option)) as browser:

    browser.get('https://www.tradingview.com/symbols/BTCUSD/technicals/')

    # wait until js has filled in the element - and a bit longer for js churn
    WebDriverWait(browser, 20).until(EC.visibility_of_element_located(
        (By.XPATH, 
        '//*[@id="technicals-root"]/div/div/div[2]/div[2]/span')))
    time.sleep(1)

    status = browser.find_elements_by_xpath(
        '//*[@id="technicals-root"]/div/div/div[2]/div[2]/span[2]')
    print(status[0].text)

Python打印Xpath元素给出空数组

1 个答案: