Question

我正在使用Python requests.get抓取一个网站。昨天工作正常，但今天发生了问题。我无法像以前一样获得requests.text。现在，requests返回This process is automatic. Your browser will redirect to your requested content shortly. Please allow up to 5 seconds。

为解决此问题，我尝试在time.sleep()内设置requests.get函数。但这没有用，我仍然像以前一样收到最多5秒钟的允许响应。

下面是我的代码：

def web_scraping_corporationwiki(df):
    # Scrap website
    df_scrap = df.reset_index()
    for i in df_scrap.index:
        name = df_scrap.loc[i, 'Name']
        state =  df_scrap.loc[i, 'State']
        origin_row = df_scrap.loc[i]
        url = 'https://www.corporationwiki.com/search/withfacets?term='+ name +'&stateFacet='+ state
        for page_num in range(8):
            req = requests.get(url, time.sleep(50), params= dict(query="wiki", page = page_num),timeout = 100A)
            soup = BeautifulSoup(req.text, "html.parser")
            page = soup.find_all('div', class_ = 'list-group-item')

有人可以帮助我解决这个问题吗？

Python Web抓取（request.get）无法获取目标网页

0 个答案: