我正试图抓住全职员工'来自雅虎财经网站的价值为110,000。
网址为:http://finance.yahoo.com/quote/AAPL/profile?p=AAPL
我尝试过使用美味汤,但我无法在页面上找到它的价值。当我在IE浏览器中查看DOM浏览器时,我可以看到它。它有一个带有父标记的标记,父标记的父标记
。实际值位于自定义类data-react-id
。
我试过的代码:
from bs4 import BeautifulSoup as bs
html=`http://finance.yahoo.com/quote/AAPL/profile?p=AAPL`
r = requests.get(html).content
soup = bs(r)
不确定去哪里。
答案 0 :(得分:3)
问题在于"请求"相关部分 - 您使用requests
下载的页面与您在浏览器中看到的页面不同。浏览器执行了所有的javascript,提出了加载此页面所需的多个异步请求。而且,这个特定的页面非常动态本身。 "客户端"。
您可以做的是将此页面加载到由selenium
自动化的真实浏览器中。工作示例:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")
# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)
driver.close()
打印110,000
。
答案 1 :(得分:1)
有很多方法可以从网上下载财务数据或任何类型的数据。下面的脚本下载股票价格并将所有内容保存到CSV文件。
select * from a
right join s
on case when s.[Diff ] = 0 and a.ActivityDate < s.[ExecDate]
then a.ID1 =s.ID2
when
( a.ActivityDate <s.[ExecDate] and a.ActivityDate >= s.[Date3] )
then a.ID1 =s.ID2
END
下面的脚本会将多个股票代码下载到一个文件夹中。
import urllib2
listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]
urls = []
for company in listOfStocks:
urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')
Output_File = open('C:/Users/your_path/Historical_Prices.csv','w')
New_Format_Data = ''
for counter in range(0, len(urls)):
Original_Data = urllib2.urlopen(urls[counter]).read()
if counter == 0:
New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()
rows = Original_Data.splitlines(1)
for row in range(1, len(rows)):
New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]
Output_File.write(New_Format_Data)
Output_File.close()
最后......这将下载多个股票代码的价格......
import urllib
import re
import json
symbolslist = open("C:/Users/rshuell001/Desktop/symbols/tickers.txt").read()
symbolslist = symbolslist.split("\n")
for symbol in symbolslist:
myfile = open("C:/Users/your_path/Desktop/symbols/" +symbol +".txt", "w+")
myfile.close()
htmltext = urllib.urlopen("http://www.bloomberg.com/markets/chart/data/1D/"+ symbol+ ":US")
data = json.load(htmltext)
datapoints = data["data_values"]
myfile = open("C:/Users/rshuell001/Desktop/symbols/" +symbol +".txt", "a")
for point in datapoints:
myfile.write(str(symbol+","+str(point[0])+","+str(point[1])+"\n"))
myfile.close()
我写了一本关于这些事情的书,以及很多其他的东西。您可以使用以下网址找到它。