我刚开始进行网络抓取,并决定为汽车价格创建一个抓取器,这是我的第一个项目。我很快遇到了一个问题,当我打印汤对象时,它只打印了几行,说“服务不可用”,并且“服务器暂时无法满足我的请求”。为什么会发生这种情况,我该如何解决,请先谢谢!
这是我的代码:
import requests
from bs4 import BeautifulSoup
url = 'https://www.olx.com.eg/en/vehicles/cars-for-sale/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
print (soup)
答案 0 :(得分:0)
设置User-Agent
HTTP标头以获取正确的响应:
import requests
from bs4 import BeautifulSoup
url = 'https://www.olx.com.eg/en/vehicles/cars-for-sale/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
print (soup)
打印:
<!DOCTYPE html>
<html lang="en" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#">
<head>
<meta content="H1XIF2PCYRBVJS6NaAD2pcbbm2oqGGCj7KenRQGyVGI" name="google-site-verification"/>
<meta content="5A57304F35F5718EAD39F1A8E1EE8D3C" name="msvalidate.01"/>
...and so on.