我正在尝试从以下网站提取数据:
我将八角形的值设置为“ 6”:
我相信我的目标是正确的xpath。
这是我的代码:
import sys
import os
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver
os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe', log_file=sys.stdout)
browser = webdriver.PhantomJS(service_args=["--load-images=no", '--disk-cache=true'])
url = 'https://www.tipranks.com/stocks/sui/stock-analysis'
xpath = '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'
browser.get(url)
element = browser.find_element_by_xpath(xpath)
print(element)
这是我回来的错误:
Traceback (most recent call last):
File "C:/Users/jaspa/PycharmProjects/ig-markets-api-python-library/trader/market_signal_IV_test.py", line 15, in <module>
element = browser.find_element_by_xpath(xpath)
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51786","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/h3/div/span\", \"sessionId\": \"d8e91c70-9139-11e9-a9c9-21561f67b079\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d8e91c70-9139-11e9-a9c9-21561f67b079/element"}}
Screenshot: available via screen
我可以看到问题是由于不正确的xpath引起的,但无法弄清原因。
我还应该指出,使用硒是刮除该站点的最佳方法,并打算提取其他值并针对许多页面上的不同库存重复这些查询。如果有人认为BeutifulSoup,lmxl等会更好,那么我很高兴听到建议!
谢谢!
答案 0 :(得分:2)
您甚至都不声明所有路径。八角形位于Text
类的div中,因此请搜索该div。
client-components-ValueChange-shape__Octagon
输出:
x = browser.find_elements_by_css_selector("div[class='client-components-ValueChange-shape__Octagon']") ## Declare which class
for all in x:
print all.text
答案 1 :(得分:1)
您这里似乎有两个问题:
对于xpath,我只是这样做了:
xpath ='// div [@ class =“ client-components-ValueChange-shape__Octagon”]'
然后执行:
print(element.text)
它将获得您想要的值。但是,您的代码实际上不会等到浏览器完成页面加载后才执行xpath。对我来说,使用Firefox,这种方式只能获得大约40%的时间价值。使用Selenium处理此问题的方法有很多,最简单的方法可能是在browser.get和xpath语句之间睡眠几秒钟。
您似乎正在设置Firefox,但随后使用了Phantom。我没有在Phantom上尝试此操作,Phantom可能不需要睡眠行为。
答案 2 :(得分:1)
您可以尝试使用此CSS选择器[class$='shape__Octagon']
来定位内容。如果我去pyppeteer,我会想要以下东西:
import asyncio
from pyppeteer import launch
async def get_content(url):
browser = await launch({"headless":True})
[page] = await browser.pages()
await page.goto(url)
await page.waitForSelector("[class$='shape__Octagon']")
value = await page.querySelectorEval("[class$='shape__Octagon']","e => e.innerText")
return value
if __name__ == "__main__":
url = "https://www.tipranks.com/stocks/sui/stock-analysis"
loop = asyncio.get_event_loop()
result = loop.run_until_complete(get_content(url))
print(result.strip())
输出:
6