我需要抓取这个网站:
https://sec.report/Ticker/AAPL
我需要输入CIK号码0000320193
当我做soup.prettify时,它只是说它需要使用javascript。另外,我不想打开网络浏览器,因为它需要自动化
我需要使用python漂亮的汤并请求库
答案 0 :(得分:1)
要从服务器获得正确的响应,请设置正确的User-Agent
HTTP标头:
import requests
from bs4 import BeautifulSoup
url = 'https://sec.report/Ticker/AAPL'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print(soup.h2.text) # or print(soup.h2.text.split()[-1]) for "0000320193"
打印:
SEC CIK 0000320193