我正在使用Python requests.get
抓取一个网站。昨天工作正常,但今天发生了问题。
我无法像以前一样获得requests.text
。现在,requests
返回This process is automatic. Your browser will redirect to your requested content shortly. Please allow up to 5 seconds
。
为解决此问题,我尝试在time.sleep()
内设置requests.get
函数。但这没有用,我仍然像以前一样收到最多5秒钟的允许响应。
下面是我的代码:
def web_scraping_corporationwiki(df):
# Scrap website
df_scrap = df.reset_index()
for i in df_scrap.index:
name = df_scrap.loc[i, 'Name']
state = df_scrap.loc[i, 'State']
origin_row = df_scrap.loc[i]
url = 'https://www.corporationwiki.com/search/withfacets?term='+ name +'&stateFacet='+ state
for page_num in range(8):
req = requests.get(url, time.sleep(50), params= dict(query="wiki", page = page_num),timeout = 100A)
soup = BeautifulSoup(req.text, "html.parser")
page = soup.find_all('div', class_ = 'list-group-item')
有人可以帮助我解决这个问题吗?