今日美国新闻精选网

时间:2018-06-23 13:46:07

标签: python web-scraping beautifulsoup stocks

我正在使用Python,尝试从U.S. Today Money Stocks Under $10抓取一张10美元以下的股票表。然后将每个元素添加到列表中(以便我可以遍历每只股票)。目前,我有以下代码:

resp = requests.get('https://money.usnews.com/investing/stocks/stocks-under-10')
soup = bs.BeautifulSoup(resp.text, "lxml")
table = soup.find('table', {'class': 'table stock full-row search-content'})
tickers = []
for row in table.findAll('tr')[1:]:
    ticker = str(row.findAll('td')[0].text)
    tickers.append(ticker)

我不断收到错误消息:

Traceback (most recent call last):
  File "sandp.py", line 98, in <module>
    sandp(0)
  File "sandp.py", line 40, in sandp
    for row in table.findAll('tr')[1:]:
AttributeError: 'NoneType' object has no attribute 'findAll'

1 个答案:

答案 0 :(得分:2)

网站是动态的,因此,您可以使用selenium

from selenium import webdriver
import collections
from bs4 import BeautifulSoup as soup
import re
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://money.usnews.com/investing/stocks/stocks-under-10')
s = soup(d.page_source, 'lxml')
while True:
  try:
    d.find_element_by_link_text("Load More").click() #get all data
  except:
    break
company = collections.namedtuple('company', ['name', 'abbreviation', 'description', 'stats'])
headers = [['a', {'class':'search-result-link'}], ['a', {'class':'text-muted'}], ['p', {'class':'text-small show-for-medium-up ellipsis'}], ['dl', {'class':'inline-dl'}], ['span', {'class':'stock-trend'}], ['div', {'class':'flex-row'}]]
final_data = [[getattr(i.find(a, b), 'text', None) for a, b in headers] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'search-result flex-row'})]
new_data = [[i[0], i[1], re.sub('\n+\s{2,}', '', i[2]), [re.findall('[\$\w\.%/]+', d) for d in i[3:]]] for i in final_data]
final_results = [i[:3]+[dict(zip(['Price', 'Daily Change', 'Percent Change'], filter(lambda x:re.findall('\d', x), i[-1][0])))] for i in new_data]
new_results = [company(*i) for i in final_results]

产出(第一家公司):

company(name=u'Aileron Therapeutics Inc', abbreviation=u'ALRN', description=u'Aileron Therapeutics, Inc. is a clinical stage biopharmaceutical company, which focuses on developing and commercializing stapled peptides. Its ALRN-6924 product targets the tumor suppressor p53 for the treatment of a wide variety of cancers. It also offers the MDMX and MDM2. The company was founded by Gregory L. Verdine, Rosana Kapeller, Huw M. Nash, Joseph A. Yanchik III, and Loren David Walensky in June 2005 and is headquartered in Cambridge, MA.more\n', stats={'Daily Change': u'$0.02', 'Price': u'$6.04', 'Percent Change': u'0.33%'})

编辑:

所有缩写:

abbrevs = [i.abbreviation for i in new_results]

输出:

[u'ALRN', u'HAIR', u'ONCY', u'EAST', u'CERC', u'ENPH', u'CASI', u'AMBO', u'CWBR', u'TRXC', u'NIHD', u'LGCY', u'MRNS', u'RFIL', u'AUTO', u'NEPT', u'ARQL', u'ITUS', u'SRAX', u'APTO']