网页抓取几个href

时间:2021-04-05 00:24:53

标签: python html web-scraping

我想用 Python 抓取这个页面:https://statusinvest.com.br/acoes/proventos/ibovespa

使用此代码:

import requests
from bs4 import BeautifulSoup as bs

URL = "https://statusinvest.com.br/acoes/proventos/ibovespa"

page = 1
req = requests.get(URL+str(page))
soup = bs(req.text, 'html.parser')
container = soup.find('div', attrs={'class','list'})
dividends = container.find('a')

for dividend in dividends:
  links = dividend.find_all('a')
 
 print(links)

但它没有返回任何东西。

有人可以帮我吗?

1 个答案:

答案 0 :(得分:0)

已编辑:您可以看到以下更新的代码来访问您在评论中提到的任何数据,您可以根据需要进行修改,因为该页面上的所有数据都在数据变量中。

更新代码:

import json
import requests
from bs4 import BeautifulSoup as bs

url = "https://statusinvest.com.br"

links = []
req = requests.get(f"{url}/acoes/proventos/ibovespa")
soup = bs(req.content, 'html.parser')
data = json.loads(soup.find('input', attrs={'id': 'result'})["value"])
print("Date Com Data")
for datecom in data["dateCom"]:
    print(f"{datecom['code']}\t{datecom['companyName']}\t{datecom['companyNameClean']}\t{datecom['companyId']}\t{datecom['companyId']}\t{datecom['resultAbsoluteValue']}\t{datecom['dateCom']}\t{datecom['paymentDividend']}\t{datecom['earningType']}\t{datecom['dy']}\t{datecom['recentEvents']}\t{datecom['recentEvents']}\t{datecom['uRLClear']}")
print("\nDate Payment Data")
for datePayment in data["datePayment"]:
    print(f"{datePayment['code']}\t{datePayment['companyName']}\t{datePayment['companyNameClean']}\t{datePayment['companyId']}\t{datePayment['companyId']}\t{datePayment['resultAbsoluteValue']}\t{datePayment['dateCom']}\t{datePayment['paymentDividend']}\t{datePayment['earningType']}\t{datePayment['dy']}\t{datePayment['recentEvents']}\t{datePayment['recentEvents']}\t{datePayment['uRLClear']}")
print("\nProvisioned Data")
for provisioned in data["provisioned"]:
    print(f"{provisioned['code']}\t{provisioned['companyName']}\t{provisioned['companyNameClean']}\t{provisioned['companyId']}\t{provisioned['companyId']}\t{provisioned['resultAbsoluteValue']}\t{provisioned['dateCom']}\t{provisioned['paymentDividend']}\t{provisioned['earningType']}\t{provisioned['dy']}\t{provisioned['recentEvents']}\t{provisioned['recentEvents']}\t{provisioned['uRLClear']}")

查看该网站的源代码可以直接获取json并按照以下代码获取所需的链接。

代码:

import json
import requests
from bs4 import BeautifulSoup as bs

url = "https://statusinvest.com.br"

links=[]
req = requests.get(f"{url}/acoes/proventos/ibovespa")
soup = bs(req.content, 'html.parser')
data = json.loads(soup.find('input', attrs={'id': 'result'})["value"])
for datecom in data["dateCom"]:
    links.append(f"{url}{datecom['uRLClear']}")
for datePayment in data["datePayment"]:
    links.append(f"{url}{datePayment['uRLClear']}")
for provisioned in data["provisioned"]:
    links.append(f"{url}{provisioned['uRLClear']}")

print(links)

输出: output

如果您有任何问题,请告诉我:)