使用BeautifulSoup在选择按钮之后提取“ Table1”吗?

时间:2019-11-30 12:11:15

标签: python html-table beautifulsoup multipleselection

选择“香港股票”和“显示全部”按钮后,我尝试下载“ https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng”中的表格。我检查了Chrome /检查/网络功能。没有向服务器发送新数据的请求。因此,我怀疑数据在原始页面中。我检查了它是否在按下“显示全部”按钮后出现在“表1”中。我尝试了以下代码,但没有任何反应,请告知:

url="https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng"
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"

src = result.content
soup = BeautifulSoup(src, 'lxml')
table = soup.findAll("Table1")
output_rows = []
for table_row in table.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)

print(output_rows)

1 个答案:

答案 0 :(得分:1)

要获取数据,必须使用正确的参数发出POST请求。

例如:

import requests
from bs4 import BeautifulSoup

url = 'https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng'

with requests.session() as s:
    soup = BeautifulSoup(s.get(url).text, 'html.parser')

    data = {i['name']: i['value'] if 'value' in i.attrs else '' for i in soup.select('input[name]')}
    del data['StockMarginRatioGrid$btnFind']
    data['StockMarginRatioGrid$txtExchange'] = 'HKEX'

    soup = BeautifulSoup(s.post(url, data=data).text, 'html.parser')

    for tr in soup.select('#StockMarginRatioGrid_gridResult tr'):
        print(''.join('{:^21}'.format(td.text) for td in tr.select('td')))

打印:

 Stock Code              Name          Stock Margin Ratio      Deposit Ratio                              Stock Code              Name          Stock Margin Ratio      Deposit Ratio    
      1               CKHHOLDINGS              85%                  15%                                        2               CLPHOLDINGS              85%                  15%         
      3               HK&CHINAGAS              85%                  15%                                        4              WHARFHOLDINGS             82%                  18%         
      5              HSBCHOLDINGS              85%                  15%                                        6               POWERASSETS              85%                  15%         
      8                  PCCW                  75%                  25%                                       10              HANGLUNGGROUP             75%                  25%         
     11              HANGSENGBANK              85%                  15%                                       12              HENDERSONLAND             85%                  15%         
     14                HYSANDEV                75%                  25%                                       16                 SHKPPT                 85%                  15%         
     17               NEWWORLDDEV              85%                  15%                                       18              ORIENTALPRESS             20%                  80%         
     19              SWIREPACIFICA             85%                  15%                                       20                WHEELOCK                82%                  18%         
     23               BANKOFEASIA              75%                  25%                                       25             CHEVALIERINT'L             40%                  60%         

... and so on.

编辑:要写入CSV文件,您可以使用以下示例:

import csv
import requests
from bs4 import BeautifulSoup

url = 'https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng'

with requests.session() as s, open('output.csv', 'w') as f_out:
    writer = csv.writer(f_out)

    soup = BeautifulSoup(s.get(url).text, 'html.parser')

    data = {i['name']: i['value'] if 'value' in i.attrs else '' for i in soup.select('input[name]')}
    del data['StockMarginRatioGrid$btnFind']
    data['StockMarginRatioGrid$txtExchange'] = 'HKEX'

    soup = BeautifulSoup(s.post(url, data=data).text, 'html.parser')

    for tr in soup.select('#StockMarginRatioGrid_gridResult tr'):
        writer.writerow([td.text.strip() for td in tr.select('td')])