从雅虎财经中抓取股票代码的 Python 代码

时间:2021-04-10 17:57:40

标签: web-scraping yahoo-finance

我有一份可以用来投资的超过 1.000 家公司的清单。我需要所有这些公司的股票代码 ID。当我试图剥离汤的输出以及试图遍历所有公司名称时,我发现了困难。

请查看网站示例:https://finance.yahoo.com/lookup?s=asml。这个想法是替换 asml 并放置 'https://finance.yahoo.com/lookup?s='+ Companies.,这样我就可以遍历所有公司。

companies=df    
        Company name
    0   Abbott Laboratories
    1   ABBVIE
    2   Abercrombie
    3   Abiomed
    4   Accenture Plc

这是我现在拥有的代码,其中条码不起作用,并且所有公司的循环也不起作用。

#Create a function to scrape the data
def scrape_stock_symbols():
  Companies=df
  url= 'https://finance.yahoo.com/lookup?s='+ Companies
  page= requests.get(url)

  soup = BeautifulSoup(page.text, "html.parser")
  Company_Symbol=Soup.find_all('td',attrs ={'class':'data-col0 Ta(start) Pstart(6px) Pend(15px)'})

  for i in company_symbol:
       try:
       row = i.find_all('td')
       company_symbol.append(row[0].text.strip())
    
     except Exception: 
      if company not in company_symbol:
        next(Company)

  return (company_symbol)

#Loop through every company in companies to get all of the tickers from the website
for Company in companies:
  try:
    (temp_company_symbol) = scrape_stock_symbols(company)

  except Exception: 
    if company not in companies:
        next(Company)

另一个困难是从 yahoo Finance 中查找的符号会检索到许多公司名称。 之后我将不得不清除数据。我想将 AMS 交易所设置为标准,因此如果一家公司在多个交易所上市,我只对 AMS 股票代码感兴趣。最终目标是创建一个新的数据框:

    Comapny name           Company_symbol
0   Abbott Laboratories    ABT
1   ABBVIE                 ABBV  
2   Abercrombie            ANF

1 个答案:

答案 0 :(得分:1)

这是一个不需要任何抓取的解决方案。它使用一个名为 yahooquery 的包(免责声明:我是作者),它利用 API 端点返回用户查询的符号。你可以这样做:

import pandas as pd
import yahooquery as yq

def get_symbol(query, preferred_exchange='AMS'):
    try:
        data = yq.search(query)
    except ValueError: # Will catch JSONDecodeError
        print(query)
    else:
        quotes = data['quotes']
        if len(quotes) == 0:
            return 'No Symbol Found'

        symbol = quotes[0]['symbol']
        for quote in quotes:
            if quote['exchange'] == preferred_exchange:
                symbol = quote['symbol']
                break
        return symbol

companies = ['Abbott Laboratories', 'ABBVIE', 'Abercrombie', 'Abiomed', 'Accenture Plc']
df = pd.DataFrame({'Company name': companies})
df['Company symbol'] = df.apply(lambda x: get_symbol(x['Company name']), axis=1)

          Company name Company symbol
0  Abbott Laboratories            ABT
1               ABBVIE           ABBV
2          Abercrombie            ANF
3              Abiomed           ABMD
4        Accenture Plc            ACN