Question

我有一个网址，我要从中提取具有以下数据的行：“基础股票：NCC 96.70 As on Jun 06，2019 10:12:20 IST” 并提取符号列表中的“ NCC”和底层证券价格为“ 96.70”。

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"

Answer 1

您可以向网站提出请求，然后使用Beautiful Soup解析结果。

尝试一下：

from bs4 import BeautifulSoup
import requests

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"
res = requests.get(url)
soup = BeautifulSoup(res.text)

# hacky way of finding and parsing the stock data
soup.get_text().split("Underlying Stock")[1][2:10].split(" ")

打印输出：

['NCC', '96.9']

PS：如果收到有关lxml的警告......它是已安装的默认解析器。然后更改此行：soup = BeautifulSoup(res.text, features="lxml")。您需要安装lxml，例如在您的环境中使用pip install lxml。

Answer 2

另一种版本，hacky少。

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"

page_html = requests.get(url).text
page_soup = BeautifulSoup(page_html, "html.parser")
page_soup.find("b").next.split(' ')

Answer 3

一种简洁的方法是为第一个右对齐的表格单元格（td[align=right]）选择；您实际上可以将其简化为[align=right]属性：

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17')
soup = bs(r.content, 'lxml')
headline = soup.select_one('[align=right]').text.strip().replace('\xa0\n',' ')
print(headline)

您也可以使用第一张桌子的第一行

from bs4 import BeautifulSoup
import requests

r = requests.get('https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17')
soup = bs(r.content, 'lxml')
table = soup.select_one('table')
headline = table.select_one('tr:nth-of-type(1)').text.replace('\n',' ').replace('\xa0', ' ').strip()
print(headline)

通过网络链接获取价值

3 个答案: