抓取被网站屏蔽

时间:2021-07-02 06:07:38

标签: python web-scraping beautifulsoup

我正在尝试从 nasdaq 获取所有索引,但是当我执行脚本时,它停止在那里,直到我点击 Ctrl+C。有谁知道如何解决这个问题? (甚至是另一个我可以获得索引的页面)

from bs4 import BeautifulSoup
import urllib.request as ur
url = "https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index"

read_data = ur.urlopen(url).read()
soup_data = BeautifulSoup(read_data,"lxml")

print(soup_data.prettify())

谢谢。

1 个答案:

答案 0 :(得分:1)

要获得响应,请添加 user-agent 标头。以下是使用 requests 模块的示例:

import requests
from bs4 import BeautifulSoup

url = "https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index"

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}
response = requests.get(url, headers=HEADERS).content
soup_data = BeautifulSoup(response, "lxml")

print(soup_data.prettify())
相关问题