使用bs4搜索特定类

时间:2017-03-28 07:40:15

标签: python web-scraping beautifulsoup google-finance

我正试图抓住这个谷歌财务link。此页面包含 SP_arrow_last_off 类的类。所以,如果我做这样的事情:

url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"

headers={'Host': 'www.google.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
last = soup.find_all(class_= "SP_arrow_last_off")
if(last):
    print("HI")

它不会打印任何内容。我检查了更多,我最后得到的是一个空列表或没有。如果某个类存在,我该如何获得True;如果没有,我如何获得False

3 个答案:

答案 0 :(得分:3)

'SP_arrow_last_off'存在于源代码中,但使用JavaScript函数将数据填充到其中。

如果您需要获取数据,则需要了解源代码中存在的数据。

要获取数据,您可以使用 lxml 模块执行此类操作,该模块比 BeautifulSoup 快一个数量级(如果写得正确):

import requests
from lxml import html

url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"

headers={'Host': 'www.google.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}

response = requests.get(url, headers=headers)
soup = html.fromstring(response.content)
result_list = []
for row in soup.xpath('//table[@class="gf-table historical_price"]/tr') : 
    data = row.xpath('.//td/text()')
    if data :
        result_list.append({'date' : data[0].strip(), 'open' : data[1].strip(),
            'high' : data[2].strip(), 'low' : data[3].strip(),
            'close' : data[4].strip(), 'volume' : data[5].strip()})

print result_list

这将导致类似这样的事情:

[{'volume': '9,253', 'high': '15.70', 'low': '14.15', 'date': 'Jan 22, 2009', 'close': '14.35', 'open': '15.70'}, {'volume': '10,091', 'high': '14.95', 'low': '14.30', 'date': 'Jan 21, 2009', 'close': '14.65', 'open': '14.50'}, {'volume': '9,459', 'high': '15.00', 'low': '14.20', 'date': 'Jan 20, 2009', 'close': '14.90', 'open': '15.00'}, {'volume': '3,768', 'high': '14.90', 'low': '14.30', 'date': 'Jan 19, 2009', 'close': '14.35', 'open': '14.50'}, {'volume': '9,720', 'high': '15.00', 'low': '14.35', 'date': 'Jan 16, 2009', 'close': '14.50', 'open': '14.80'}, {'volume': '5,863', 'high': '15.00', 'low': '14.00', 'date': 'Jan 15, 2009', 'close': '15.00', 'open': '14.75'}, {'volume': '7,952', 'high': '15.50', 'low': '14.25', 'date': 'Jan 14, 2009', 'close': '14.80', 'open': '14.25'}, {'volume': '8,359', 'high': '15.05', 'low': '14.20', 'date': 'Jan 13, 2009', 'close': '14.65', 'open': '14.55'}, {'volume': '12,854', 'high': '15.85', 'low': '14.40', 'date': 'Jan 12, 2009', 'close': '14.90', 'open': '15.00'}, {'volume': '35,580', 'high': '15.45', 'low': '13.20', 'date': 'Jan 9, 2009', 'close': '15.25', 'open': '15.10'}, {'volume': '29,063', 'high': '17.85', 'low': '15.15', 'date': 'Jan 7, 2009', 'close': '15.85', 'open': '17.50'}, {'volume': '16,543', 'high': '18.30', 'low': '17.55', 'date': 'Jan 6, 2009', 'close': '17.90', 'open': '17.70'}, {'volume': '36,993', 'high': '19.50', 'low': '18.00', 'date': 'Jan 5, 2009', 'close': '18.30', 'open': '18.90'}, {'volume': '120,522', 'high': '19.70', 'low': '17.30', 'date': 'Jan 2, 2009', 'close': '18.30', 'open': '17.30'}, {'volume': '16,329', 'high': '16.00', 'low': '15.10', 'date': 'Dec 31, 2008', 'close': '15.70', 'open': '15.85'}, {'volume': '53,500', 'high': '16.30', 'low': '14.90', 'date': 'Dec 30, 2008', 'close': '15.10', 'open': '16.00'}, {'volume': '14,006', 'high': '16.30', 'low': '15.10', 'date': 'Dec 29, 2008', 'close': '15.40', 'open': '15.50'}, {'volume': '5,025', 'high': '16.50', 'low': '15.50', 'date': 'Dec 26, 2008', 'close': '15.60', 'open': '16.30'}, {'volume': '17,318', 'high': '16.35', 'low': '15.50', 'date': 'Dec 24, 2008', 'close': '16.05', 'open': '16.35'}, {'volume': '11,175', 'high': '16.55', 'low': '16.00', 'date': 'Dec 23, 2008', 'close': '16.15', 'open': '16.25'}, {'volume': '13,192', 'high': '17.20', 'low': '16.35', 'date': 'Dec 22, 2008', 'close': '16.80', 'open': '16.90'}, {'volume': '37,826', 'high': '17.45', 'low': '16.25', 'date': 'Dec 19, 2008', 'close': '16.60', 'open': '16.95'}, {'volume': '10,818', 'high': '17.00', 'low': '16.25', 'date': 'Dec 18, 2008', 'close': '16.60', 'open': '16.50'}, {'volume': '26,070', 'high': '18.50', 'low': '16.70', 'date': 'Dec 17, 2008', 'close': '16.70', 'open': '17.95'}, {'volume': '15,573', 'high': '18.00', 'low': '17.05', 'date': 'Dec 16, 2008', 'close': '17.55', 'open': '17.45'}, {'volume': '18,849', 'high': '17.65', 'low': '16.75', 'date': 'Dec 15, 2008', 'close': '17.10', 'open': '17.65'}, {'volume': '37,383', 'high': '18.45', 'low': '16.05', 'date': 'Dec 12, 2008', 'close': '16.50', 'open': '17.25'}, {'volume': '57,272', 'high': '18.80', 'low': '16.50', 'date': 'Dec 11, 2008', 'close': '18.15', 'open': '16.75'}, {'volume': '34,212', 'high': '17.95', 'low': '16.05', 'date': 'Dec 10, 2008', 'close': '17.95', 'open': '16.50'}, {'volume': '11,611', 'high': '18.00', 'low': '16.00', 'date': 'Dec 8, 2008', 'close': '16.10', 'open': '18.00'}, {'volume': '20,052', 'high': '17.50', 'low': '15.65', 'date': 'Dec 5, 2008', 'close': '16.40', 'open': '16.60'}, {'volume': '9,132', 'high': '17.00', 'low': '14.75', 'date': 'Dec 4, 2008', 'close': '16.15', 'open': '14.75'}, {'volume': '6,023', 'high': '16.45', 'low': '15.70', 'date': 'Dec 3, 2008', 'close': '16.00', 'open': '16.00'}, {'volume': '13,567', 'high': '16.30', 'low': '15.10', 'date': 'Dec 2, 2008', 'close': '15.55', 'open': '16.30'}, {'volume': '15,421', 'high': '17.15', 'low': '15.05', 'date': 'Dec 1, 2008', 'close': '16.70', 'open': '15.05'}, {'volume': '3,543', 'high': '17.35', 'low': '16.25', 'date': 'Nov 28, 2008', 'close': '16.35', 'open': '16.25'}, {'volume': '11,130', 'high': '17.65', 'low': '16.55', 'date': 'Nov 26, 2008', 'close': '16.90', 'open': '17.25'}, {'volume': '126,113', 'high': '19.90', 'low': '16.25', 'date': 'Nov 25, 2008', 'close': '17.00', 'open': '16.80'}, {'volume': '17,069', 'high': '17.55', 'low': '15.75', 'date': 'Nov 24, 2008', 'close': '16.50', 'open': '15.75'}, {'volume': '10,550', 'high': '16.35', 'low': '15.30', 'date': 'Nov 21, 2008', 'close': '16.00', 'open': '15.80'}, {'volume': '9,892', 'high': '17.00', 'low': '16.00', 'date': 'Nov 20, 2008', 'close': '16.25', 'open': '16.00'}, {'volume': '16,597', 'high': '17.65', 'low': '16.50', 'date': 'Nov 19, 2008', 'close': '16.55', 'open': '17.15'}, {'volume': '13,041', 'high': '18.00', 'low': '16.70', 'date': 'Nov 18, 2008', 'close': '17.10', 'open': '17.70'}, {'volume': '13,403', 'high': '18.45', 'low': '17.30', 'date': 'Nov 17, 2008', 'close': '18.00', 'open': '18.20'}, {'volume': '24,101', 'high': '19.20', 'low': '18.15', 'date': 'Nov 14, 2008', 'close': '18.45', 'open': '19.00'}, {'volume': '68,975', 'high': '18.85', 'low': '17.55', 'date': 'Nov 12, 2008', 'close': '18.60', 'open': '18.40'}, {'volume': '35,525', 'high': '20.05', 'low': '18.25', 'date': 'Nov 11, 2008', 'close': '18.30', 'open': '20.05'}, {'volume': '152,431', 'high': '22.35', 'low': '19.65', 'date': 'Nov 10, 2008', 'close': '20.00', 'open': '21.20'}, {'volume': '245,444', 'high': '21.60', 'low': '17.60', 'date': 'Nov 7, 2008', 'close': '20.00', 'open': '17.60'}, {'volume': '40,649', 'high': '18.80', 'low': '17.10', 'date': 'Nov 6, 2008', 'close': '18.30', 'open': '17.15'}, {'volume': '116,608', 'high': '19.45', 'low': '14.90', 'date': 'Nov 5, 2008', 'close': '18.55', 'open': '18.30'}, {'volume': '113,707', 'high': '19.50', 'low': '16.50', 'date': 'Nov 4, 2008', 'close': '18.05', 'open': '17.00'}, {'volume': '54,681', 'high': '18.00', 'low': '16.75', 'date': 'Nov 3, 2008', 'close': '17.10', 'open': '17.80'}, {'volume': '70,763', 'high': '18.60', 'low': '16.70', 'date': 'Oct 31, 2008', 'close': '17.05', 'open': '17.20'}, {'volume': '60,138', 'high': '19.10', 'low': '16.00', 'date': 'Oct 29, 2008', 'close': '16.45', 'open': '19.10'}, {'volume': '70,725', 'high': '16.95', 'low': '13.50', 'date': 'Oct 27, 2008', 'close': '14.60', 'open': '15.25'}, {'volume': '61,150', 'high': '19.90', 'low': '16.05', 'date': 'Oct 24, 2008', 'close': '16.50', 'open': '17.25'}, {'volume': '54,468', 'high': '20.25', 'low': '16.30', 'date': 'Oct 23, 2008', 'close': '18.75', 'open': '18.30'}, {'volume': '164,349', 'high': '22.20', 'low': '20.00', 'date': 'Oct 22, 2008', 'close': '20.25', 'open': '21.85'}, {'volume': '88,705', 'high': '22.95', 'low': '21.10', 'date': 'Oct 21, 2008', 'close': '21.40', 'open': '22.80'}, {'volume': '361,409', 'high': '23.25', 'low': '19.70', 'date': 'Oct 20, 2008', 'close': '21.80', 'open': '22.50'}, {'volume': '903,134', 'high': '28.70', 'low': '21.95', 'date': 'Oct 17, 2008', 'close': '21.95', 'open': '28.05'}, {'volume': '972,087', 'high': '29.25', 'low': '21.60', 'date': 'Oct 16, 2008', 'close': '26.50', 'open': '22.05'}, {'volume': '563,418', 'high': '25.55', 'low': '20.05', 'date': 'Oct 15, 2008', 'close': '24.55', 'open': '21.30'}, {'volume': '336,544', 'high': '26.00', 'low': '21.50', 'date': 'Oct 14, 2008', 'close': '22.10', 'open': '25.65'}, {'volume': '449,346', 'high': '26.60', 'low': '23.30', 'date': 'Oct 13, 2008', 'close': '24.70', 'open': '24.30'}, {'volume': '603,964', 'high': '24.90', 'low': '21.65', 'date': 'Oct 10, 2008', 'close': '23.65', 'open': '24.90'}, {'volume': '1,232,192', 'high': '29.20', 'low': '25.10', 'date': 'Oct 8, 2008', 'close': '26.40', 'open': '28.00'}, {'volume': '4,556,711', 'high': '38.00', 'low': '27.85', 'date': 'Oct 7, 2008', 'close': '30.05', 'open': '32.00'}, {'volume': '11,750,865', 'high': '80.00', 'low': '31.60', 'date': 'Oct 6, 2008', 'close': '33.55', 'open': '80.00'}]

答案 1 :(得分:1)

在phantomjs(http://phantomjs.org/download.html)和Selenium的帮助下你可以做到这一点

步骤: 1.在终端或cmd上使用命令:pip install selenium 2.下载phantomjs&解压缩它而不是放入" phantomjs.exe"在python路径上,例如在Windows上,C:\ Python27

使用此代码,它将为您提供所需的结果:

from selenium import webdriver
from bs4 import BeautifulSoup


url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"
driver = webdriver.PhantomJS()
driver.get(url)

data = driver.page_source

soup = BeautifulSoup(data, 'html.parser')

last = soup.find_all(class_= "SP_arrow_last_off")

if(last):
    print("HI")

此代码将为您提供last值,并将打印HI

答案 2 :(得分:0)

您似乎需要首先使用Selenium等模块下载浏览器页面。源页面代码中没有类SP_arrow_last_off的元素。它可能是由某些JS代码生成的,因此您无法通过requests模块获取它。