Question

我的代码成功地从https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false

中删除了表类标记

但是，上面的网站上有多个页面，我希望能够抓取每个页面中的所有代码。（每页中表格的第一列）

例如，使用上面的网址，当我点击链接到＆＃34; 2＆＃34;整体网址不会改变。我也无法找到每个页面的隐藏链接，但是，我能够在源代码下的每个页面中看到所有表格。

它似乎非常相似：Scrape multiple pages with BeautifulSoup and Python

但是，我找不到网络下的页码来源。

如何更改我的代码以从所有可用列表页面中删除数据？

我的代码仅适用于第1页：

import bs4 as bs
import pickle
import requests

def save_hkex_tickers():
  resp = requests.get('https://www.hsi.com.hk/HSI-Net/HSI-Net?cmd=tab&pageId=en.indexes.hscis.hsci.constituents&expire=false&lang=en&tabs.current=en.indexes.hscis.hsci.overview_des%5Een.indexes.hscis.hsci.constituents&retry=false')
  soup = bs.BeautifulSoup(resp.text, "lxml")
  table = soup.find('table',{'class':'greygeneraltxt'})
  tickers = []
  for row in table.findAll('tr')[2:]:
    ticker = row.findAll('td')[1].text
    tickers.append(ticker)

  print(tickers)
  return tickers

save_hkex_tickers()

使用BeautifulSoup和Python抓取多个页面ID

0 个答案: