Python Web抓取(request.get)无法获取目标网页

时间:2020-01-07 16:07:04

标签: python get request

我正在使用Python requests.get抓取一个网站。昨天工作正常,但今天发生了问题。 我无法像以前一样获得requests.text。现在,requests返回This process is automatic. Your browser will redirect to your requested content shortly. Please allow up to 5 seconds

为解决此问题,我尝试在time.sleep()内设置requests.get函数。但这没有用,我仍然像以前一样收到最多5秒钟的允许响应。

下面是我的代码:

def web_scraping_corporationwiki(df):
    # Scrap website
    df_scrap = df.reset_index()
    for i in df_scrap.index:
        name = df_scrap.loc[i, 'Name']
        state =  df_scrap.loc[i, 'State']
        origin_row = df_scrap.loc[i]
        url = 'https://www.corporationwiki.com/search/withfacets?term='+ name +'&stateFacet='+ state
        for page_num in range(8):
            req = requests.get(url, time.sleep(50), params= dict(query="wiki", page = page_num),timeout = 100A)
            soup = BeautifulSoup(req.text, "html.parser")
            page = soup.find_all('div', class_ = 'list-group-item')

有人可以帮助我解决这个问题吗?

0 个答案:

没有答案