从具有多个表的网页获取python中的数据

时间:2019-02-25 08:57:42

标签: python selenium beautifulsoup

我正试图在下面的网页上进行分析,以获取在交易所中一直处于高位或低位的股票名称。

https://www.bseindia.com/markets/equity/EQReports/HighLow.html?Flag=H#

但是,当我使用漂亮的汤下载网页并检查数据时,仅显示一半库存,这是因为  该页面有2页,因此使用上述方法,一页上有25张库存,另一页上有25张库存,我只能解析第一页,  如果我单击第二页,URL也相同,请帮助我如何解决此问题?

1 个答案:

答案 0 :(得分:1)

该站点具有api端点,该端点以一种不错的json格式将数据返回给您。您可以获取json格式的响应,然后对其进行规范化以创建表。现在,当它执行此操作时,它将返回2个表,所以我不确定是否要第二个表。如果没有,我将它们分别存储,然后附加它们以将它们放在一起。

import requests    
from pandas.io.json import json_normalize

url = 'https://api.bseindia.com/BseIndiaAPI/api/MktHighLowData/w?Grpcode=&HLflag=H&indexcode=&scripcode='

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}

payload = {
'Grpcode':'', 
'HLflag': 'H',
'indexcode':'' ,
'scripcode':'' }

jsonObj = requests.get(url, headers=headers, params=payload).json()

df_table = json_normalize(jsonObj['Table'])
df_table1 = json_normalize(jsonObj['Table1'])

df = df_table.append(df_table1)

输出:

print (df)
     ALLTimeHigh         ...                         dt_tm
0        1019.95         ...           2019-02-25T16:00:03
1         263.00         ...           2019-02-25T16:00:03
2          24.00         ...           2019-02-25T16:00:03
3          35.90         ...           2019-02-25T16:00:03
4          29.75         ...           2019-02-25T16:00:03
5          43.00         ...           2019-02-25T16:00:03
6         140.40         ...           2019-02-25T16:00:03
7          15.39         ...           2019-02-25T16:00:03
8         724.00         ...           2019-02-25T16:00:03
9        1495.00         ...           2019-02-25T16:00:03
10        123.15         ...           2019-02-25T16:00:03
11        121.00         ...           2019-02-25T16:00:03
12        238.50         ...           2019-02-25T16:00:03
13         89.00         ...           2019-02-25T16:00:03
14        819.95         ...           2019-02-25T16:00:03
15        112.40         ...           2019-02-25T16:00:03
16         49.95         ...           2019-02-25T16:00:03
17        330.85         ...           2019-02-25T16:00:03
18        167.45         ...           2019-02-25T16:00:03
19         25.10         ...           2019-02-25T16:00:03
20        940.00         ...           2019-02-25T16:00:03
21        165.00         ...           2019-02-25T16:00:03
22           NaN         ...           2019-02-25T16:00:03
23        239.00         ...           2019-02-25T16:00:03
24        151.55         ...           2019-02-25T16:00:03
25         34.35         ...           2019-02-25T16:00:03
26        256.15         ...           2019-02-25T16:00:03
27         49.75         ...           2019-02-25T16:00:03
28        103.25         ...           2019-02-25T16:00:03
29         50.50         ...           2019-02-25T16:00:03
..           ...         ...                           ...
87        135.00         ...           2019-02-25T16:00:03
88        219.80         ...           2019-02-25T16:00:03
89         58.00         ...           2019-02-25T16:00:03
90        494.00         ...           2019-02-25T16:00:03
91        285.30         ...           2019-02-25T16:00:03
92         55.65         ...           2019-02-25T16:00:03
93          4.45         ...           2019-02-25T16:00:03
94         50.00         ...           2019-02-25T16:00:03
95         50.00         ...           2019-02-25T16:00:03
96         92.50         ...           2019-02-25T16:00:03
97        154.80         ...           2019-02-25T16:00:03
98         82.40         ...           2019-02-25T16:00:03
99        293.85         ...           2019-02-25T16:00:03
100       396.00         ...           2019-02-25T16:00:03
101        98.00         ...           2019-02-25T16:00:03
102       144.60         ...           2019-02-25T16:00:03
103        11.50         ...           2019-02-25T16:00:03
104        42.95         ...           2019-02-25T16:00:03
105       313.00         ...           2019-02-25T16:00:03
106      1120.00         ...           2019-02-25T16:00:03
107        87.00         ...           2019-02-25T16:00:03
108        82.00         ...           2019-02-25T16:00:03
109       214.00         ...           2019-02-25T16:00:03
110       505.00         ...           2019-02-25T16:00:03
111      1525.00         ...           2019-02-25T16:00:03
112       220.00         ...           2019-02-25T16:00:03
113        36.00         ...           2019-02-25T16:00:03
114       170.00         ...           2019-02-25T16:00:03
115       549.50         ...           2019-02-25T16:00:03
116      4990.00         ...           2019-02-25T16:00:03

[168 rows x 19 columns]