Question

给出一个公司 ticker 或名称，我想使用python获取其 sector 。

我已经尝试了几种潜在的解决方案，但是都没有成功

两个最有希望的是：

1）使用以下脚本：https://gist.github.com/pratapvardhan/9b57634d57f21cf3874c

from urllib import urlopen
from lxml.html import parse

'''
Returns a tuple (Sector, Indistry)
Usage: GFinSectorIndustry('IBM')
'''
def GFinSectorIndustry(name):
  tree = parse(urlopen('http://www.google.com/finance?&q='+name))
  return tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

但是我正在使用python --version 3.8

我已经能够调整此解决方案，但是最后一行不起作用，并且我对抓取网页完全陌生，因此，如果有人提出建议，我将不胜感激。

这是我当前的代码：

from urllib.request import Request, urlopen
from lxml.html import parse

name="IBM"
req = Request('http://www.google.com/finance?&q='+name, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

tree = parse(webpage)

但是最后一部分不起作用，我对这种xpath语法非常陌生：

tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

2）另一个选择是嵌入R的{{1}}程序包，如下所示：Find which sector a stock belongs to

但是，我想在Jupyter笔记本中运行它，运行TTN

只是花了很长时间。

Answer 1

在您的评论之后，对于marketwatch.com/investing/stock ，可能起作用的xpath为"//div[@class='intraday__sector']/span[@class='label']"这样做

tree.xpath("//div[@class='intraday__sector']/span[@class='label']")[0].text

应返回所需的信息。

我对抓取网页[...]
完全陌生

某些精度：

此xpath完全取决于您正在查看的网站，并解释了为什么在注释中提到的页面中搜索"//a[@id='sector']"并没有希望，因为此xpath（现已过时）是针对Google金融的。换句话说，您首先需要“研究”您感兴趣的页面，以了解所需信息的位置。
要进行此类“研究”，我使用Chrome DevTools并在控制台中检查任何xpath，并进行$x(<your-xpath-of-interest>)，其中记录了功能$x here（带有示例！）
幸运的是，您要从marketwatch.com/investing/stock获得的信息（扇区的名称）是静态生成的_{（即在页面加载时不是动态生成的，在这种情况下，本来可以使用其他抓取技术是必需的，求助于其他Python库，例如Selenium ..但这是另一个问题）。}

Answer 2

要回答这个问题：

如何在python中从股票行情或公司名称获取股市公司部门？

在阅读@keepAlive的一些材料和一些不错的建议后，我不得不找到解决方法。

以下内容以相反的方式进行工作，即使公司获得该部门的支持。有10个扇区，因此如果要获取所有扇区的信息就不会花费太多：https://www.stockmonitor.com/sectors/

鉴于marketwatch.com/investing/stock抛出405错误，我决定使用https://www.stockmonitor.com/sectors/，例如：

https://www.stockmonitor.com/sector/healthcare/

代码如下：

import requests

import pandas as pd

from lxml.html import parse
from urllib.request import Request, urlopen

headers = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)" + " "
    "AppleWebKit/537.36 (KHTML, like Gecko)" + " " + "Chrome/35.0.1916.47" +
    " " + "Safari/537.36"
]

url = 'https://www.stockmonitor.com/sector/healthcare/'

headers_dict = {'User-Agent': headers[0]}
req = Request(url, headers=headers_dict)
webpage = urlopen(req)

tree = parse(webpage)

healthcare_tickers = []
for element in tree.xpath("//tbody/tr/td[@class='text-left']/a"):

    healthcare_tickers.append(element.text)

pd.Series(healthcare_tickers)

因此，healthcare_tickers拥有医疗保健部门中的股份公司。

Answer 3

您可以使用Yahoo Finance轻松获取任何给定公司/股票行情的行业：

import yfinance as yf

tickerdata = yf.Ticker('TSLA') #the tickersymbol for Tesla
print (tickerdata.info['sector'])

代码返回：“消费者周期性”

如果您想要有关公司/股票行情的其他信息，只需打印（tickerdata.info）即可查看所有其他可能的字典键和相应的值，例如上面代码中使用的['sector']。

如何使用python中的代码或公司名称获取股市公司部门

3 个答案: