Python BeautifulSoup削弱了雅虎财务的价值

时间:2016-08-29 03:08:12

标签: python beautifulsoup

我正试图抓住全职员工'来自雅虎财经网站的价值为110,000。

网址为:http://finance.yahoo.com/quote/AAPL/profile?p=AAPL

我尝试过使用美味汤,但我无法在页面上找到它的价值。当我在IE浏览器中查看DOM浏览器时,我可以看到它。它有一个带有父标记的标记,父标记的父标记

。实际值位于自定义类data-react-id

我试过的代码:

from bs4 import BeautifulSoup as bs
html=`http://finance.yahoo.com/quote/AAPL/profile?p=AAPL`
r = requests.get(html).content
soup = bs(r)

不确定去哪里。

2 个答案:

答案 0 :(得分:3)

问题在于"请求"相关部分 - 您使用requests下载的页面与您在浏览器中看到的页面不同。浏览器执行了所有的javascript,提出了加载此页面所需的多个异步请求。而且,这个特定的页面非常动态本身。 "客户端"。

发生了很多事情

您可以做的是将此页面加载到由selenium自动化的真实浏览器中。工作示例:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")

# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)

driver.close()

打印110,000

答案 1 :(得分:1)

有很多方法可以从网上下载财务数据或任何类型的数据。下面的脚本下载股票价格并将所有内容保存到CSV文件。

select * from a
right join s
 on case when s.[Diff ] = 0   and  a.ActivityDate < s.[ExecDate]
 then a.ID1 =s.ID2
when
( a.ActivityDate <s.[ExecDate] and a.ActivityDate >= s.[Date3] ) 
then a.ID1 =s.ID2
END

下面的脚本会将多个股票代码下载到一个文件夹中。

import urllib2

listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]

urls = []

for company in listOfStocks:
    urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')

Output_File = open('C:/Users/your_path/Historical_Prices.csv','w')

New_Format_Data = ''

for counter in range(0, len(urls)):

    Original_Data = urllib2.urlopen(urls[counter]).read()

    if counter == 0:
        New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()

    rows = Original_Data.splitlines(1)

    for row in range(1, len(rows)):

        New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]

Output_File.write(New_Format_Data)
Output_File.close()

最后......这将下载多个股票代码的价格......

import urllib
import re
import json

symbolslist = open("C:/Users/rshuell001/Desktop/symbols/tickers.txt").read()
symbolslist = symbolslist.split("\n")

for symbol in symbolslist:
    myfile = open("C:/Users/your_path/Desktop/symbols/" +symbol +".txt", "w+")
    myfile.close()

    htmltext = urllib.urlopen("http://www.bloomberg.com/markets/chart/data/1D/"+ symbol+ ":US")
    data = json.load(htmltext)
    datapoints = data["data_values"]

    myfile = open("C:/Users/rshuell001/Desktop/symbols/" +symbol +".txt", "a")
    for point in datapoints:
        myfile.write(str(symbol+","+str(point[0])+","+str(point[1])+"\n"))
    myfile.close()
我写了一本关于这些事情的书,以及很多其他的东西。您可以使用以下网址找到它。

https://www.amazon.com/Automating-Business-Processes-Reducing-Increasing-ebook/dp/B01DJJKVZC/ref=sr_1_1