通过网络链接获取价值

时间:2019-06-06 04:53:08

标签: python python-3.x pandas dataframe web-scraping

我有一个网址,我要从中提取具有以下数据的行:“基础股票:NCC 96.70 As on Jun 06,2019 10:12:20 IST” 并提取符号列表中的“ NCC”和底层证券价格为“ 96.70”。

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"

Marked Area Value Required

3 个答案:

答案 0 :(得分:1)

您可以向网站提出请求,然后使用Beautiful Soup解析结果。

尝试一下:

from bs4 import BeautifulSoup
import requests

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"
res = requests.get(url)
soup = BeautifulSoup(res.text)

# hacky way of finding and parsing the stock data
soup.get_text().split("Underlying Stock")[1][2:10].split(" ")

打印输出:

['NCC', '96.9']

PS:如果收到有关lxml的警告......它是已安装的默认解析器。然后更改此行:soup = BeautifulSoup(res.text, features="lxml")。您需要安装lxml,例如在您的环境中使用pip install lxml

答案 1 :(得分:1)

另一种版本,hacky少。

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"

page_html = requests.get(url).text
page_soup = BeautifulSoup(page_html, "html.parser")
page_soup.find("b").next.split(' ')

答案 2 :(得分:0)

一种简洁的方法是为第一个右对齐的表格单元格(td[align=right])选择;您实际上可以将其简化为[align=right]属性:

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17')
soup = bs(r.content, 'lxml')
headline = soup.select_one('[align=right]').text.strip().replace('\xa0\n',' ')
print(headline)

您也可以使用第一张桌子的第一行

from bs4 import BeautifulSoup
import requests

r = requests.get('https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17')
soup = bs(r.content, 'lxml')
table = soup.select_one('table')
headline = table.select_one('tr:nth-of-type(1)').text.replace('\n',' ').replace('\xa0', ' ').strip()
print(headline)