如何抓取我发布信息的网站

时间:2019-04-22 01:11:23

标签: python web-scraping beautifulsoup python-requests

我想从https://nseindia.com/corporates/corporateHome.html?id=allAnnouncements抓取公告信息。 具体来说,我想转到网站左侧的“公司信息”标签,然后打开股票下的公司公告链接。 之后,我想在文本框中发布某些股权符号的信息,并通过页面左侧的导出csv链接下载输出。

由于所有页面具有相同的URL https://nseindia.com/corporates/corporateHome.html?id=allAnnouncements,因此我很难理解如何首先导航到该特定页面。 我一直在尝试使用Chrome中的检查网络来了解如何从上述链接导航到特定页面。在“网络”标签上进行了一些研究之后

Actual Webpage where to navigate

Inspecting the Network to figure out the link

需要知道如何请求。

我希望脚本导航到特定页面,然后发布符号信息以下载公告csv链接

1 个答案:

答案 0 :(得分:0)

您找到了不错的网址。它以JSON格式提供数据。但是此JSON有一些错误,标准模块json无法读取它。使用模块dirtyjson,我可以阅读它。

import requests
#import json
import dirtyjson

url = 'https://nseindia.com/corporates/corpInfo/equities/getAnnouncements.jsp?period=Latest%20Announced'

r = requests.get(url)
#data = r.json() # doesn't work because JSON data has some mistakes

#text = r.text.strip()
#print(text)
#data = json.loads(text) # doesn't work because JSON data has some mistakes

data = dirtyjson.loads(r.text)
#print(data)

for item in data['rows']:
    #print(item)
    print(item.keys())
    print(item['sym'])
    print(item['desc'])
    print(item['name'])
    print(item['date'][:2], item['date'][2:4], item['date'][4:8])

一些结果:

['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
MOTOGENFIN
Updates
The Motor & General Finance Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
KANORICHEM
Address Change
Kanoria Chemicals & Industries Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
JAIHINDPRO
Updates
Jaihind Projects Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLUECHIP
Appointment
Blue Chip India Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLUECHIP
Resignation
Blue Chip India Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
JAIHINDPRO
Corporate Insolvency Resolution Process
Jaihind Projects Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
PAEL
Updates
PAE Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BANDHANBNK
Updates
Bandhan Bank Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
ALICON
Updates
Alicon Castalloy Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
ADANIENT
Acquisition
Adani Enterprises Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
NLCINDIA
Updates
NLC India Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
SHILPAMED
Updates
Shilpa Medicare Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
KOTHARIPRO
Code of Conduct under SEBI(PIT) Reg., 2015
Kothari Products Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
LEEL
Updates
LEEL Electricals Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
SHILPAMED
Updates
Shilpa Medicare Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
ATULAUTO
Updates
Atul Auto Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
PDPL
Resignation
Parenteral Drugs (India) Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
IDBI
Updates
IDBI Bank Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLBLIMITED
Updates
BLB Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLBLIMITED
Shareholders meeting
BLB Limited
20 04 2019