我用python 3和漂亮的汤4解析了香港证券交易所的网页。但是,无法提取“香港和内地市场摘要”下的表格(即:上市公司数量... H股上市数量...)。这里是链接:“ https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Hong-Kong-and-Mainland-Market-Highlights?sc_lang=en#select3=0&select2=10&select1=0” 请指教。
我的代码:
import requests
from bs4 import BeautifulSoup
import csv
import sys
import os
result = requests.get("https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Hong-Kong-and-Mainland-Market-Highlights?sc_lang=en#select3=0&select2=10&select1=3")
result.raise_for_status()
result.encoding = "utf-8"
src = result.content
soup = BeautifulSoup(src, 'lxml')
print(soup.prettify())
print(" ")
print("soup.pretty() printed")
print(" ")
wait = input("PRESS ENTER TO CONTINUE.")
table = soup.find_all('table')
print(table)
print(" ")
print("TABLE printed")
print(" ")
wait2 = input("PRESS ENTER TO CONTINUE.")
答案 0 :(得分:0)
无需首先呈现页面,因为您可以以json格式获取数据。棘手的部分是json格式是如何呈现表(带有td标签和colspan标签等)。因此,需要做一点点的工作来遍历整个过程,但并非不可能:
import requests
import pandas as pd
url = 'https://www.hkex.com.hk/eng/csm/ws/Highlightsearch.asmx/GetData'
payload = {
'LangCode': 'en',
'TDD': '1',
'TMM': '11',
'TYYYY': '2019'}
jsonData = requests.get(url, params=payload).json()
final_df = pd.DataFrame()
for row in jsonData['data']:
#row = jsonData['data'][1]
data_row = []
for idx, colspan in enumerate(row['colspan']):
colspan_int = int(colspan[0])
data_row.append(row['td'][idx] * colspan_int)
flat_list = [item for sublist in data_row for item in sublist]
temp_row = pd.DataFrame([flat_list])
final_df = final_df.append(temp_row, sort=True).reset_index(drop=True)
df = final_df[final_df[0].str.contains(r'Total market
capitalisation(?!$)')].iloc[:,:2]
df['date'] = date
df.to_csv('file.csv', index=False)
输出:
print (final_df.to_string())
0 1 2 3 4 5 6
0 Hong Kong <br>Exchange (01/11/2019 ) Hong Kong <br>Exchange (01/11/2019 ) Shanghai Stock<br>Exchange (01/11/2019 ) Shanghai Stock<br>Exchange (01/11/2019 ) Shenzhen Stock<br>Exchange (01/11/2019 ) Shenzhen Stock<br>Exchange (01/11/2019 )
1 Main Board GEM A Share B Share A Share B Share
2 No. of listed companies 2,031 383 1,488 50 2,178 47
3 No. of listed H shares 256 22 n.a. n.a. n.a. n.a.
4 No. of listed red-chips stocks 170 5 n.a. n.a. n.a. n.a.
5 Total no. of listed securities 12,573 384 n.a. n.a. n.a. n.a.
6 Total market capitalisation<br>(Bil. dollars) HKD 31,956 HKD 109 RMB 32,945 RMB 81 RMB 22,237 RMB 50
7 Total negotiable <br>capitalisation (Bil. doll... n.a. n.a. RMB 28,756 RMB 81 RMB 16,938 RMB 49
8 Average P/E ratio (Times) 11.16 19.76 13.90 9.18 24.70 9.55
9 Total turnover <br>(Mil. shares) 196,082 560 15,881 15 22,655 14
10 Total turnover <br>(Mil. dollars) HKD 79,397 HKD 160 RMB 169,934 RMB 85 RMB 260,208 RMB 57
11 Total market turnover<br>(Mil. dollars) HKD 79,557 HKD 79,557 RMB 176,232 RMB 176,232 RMB 260,264 RMB 260,264