我的问题是如何从网站获取所有查看页面的源代码
在我的脚本中,他在div
body
代码:
from bs4 import BeautifulSoup
import requests
URL = 'https://www.manta.com/mb_43_A0_02/advertising_and_marketing/alaska?fbclid=IwAR3gfnW_bma08cITjmctgdcS5hLRau0vwl6WJHXdbwL9U3FkxIgrLkOG5rs'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
输出
<!DOCTYPE html>
<html>
<head>
<meta content="NOINDEX, NOFOLLOW" name="ROBOTS"/>
<meta content="max-age=0" http-equiv="cache-control">
<meta content="no-cache" http-equiv="cache-control"/>
<meta content="0" http-equiv="expires"/>
<meta content="Tue, 01 Jan 1980 1:00:00 GMT" http-equiv="expires"/>
<meta content="no-cache" http-equiv="pragma"/>
<meta content="10; url=/mb_43_A0_02/advertising_and_marketing/alaska?fbclid=IwAR3gfnW_bma08cITjmctgdcS5hLRau0vwl6WJHXdbwL9U3FkxIgrLkOG5rs" http-equiv="refresh"/>
<script type="text/javascript">
(function(window){
try {
if (typeof sessionStorage !== 'undefined'){
sessionStorage.setItem('distil_referrer', document.referrer);
}
} catch (e){}
})(window);
</script>
<script defer="" src="/ser-cudqxfzurtxqzqbfst.js" type="text/javascript">
</script>
<style type="text/css">
#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#swtewadbdy{display:none!important}
</style>
</meta>
</head>
<body>
<div id="distilIdentificationBlock">
</div>
</body>
</html>
此脚本未获取所有代码