我想从网站获取一些数据,例如:图片网址,网页标题等。
但反应并不好。
代码:
import urllib2
from bs4 import BeautifulSoup
url_list = [
"https://www.nfm.com/DetailsPage.aspx?productid=43382514"
]
# Image URLhttps://www.nfm.com/GetPhoto.ashx?ProductID=43382514&Size=L
def get_data(url):
user_agent = '"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"'
headers = {'User-Agent': user_agent}
page = urllib2.Request(url, None, headers)
page2 = urllib2.urlopen(page)
soup = BeautifulSoup(page2, 'html.parser')
print soup.prettify('latin-1')
# img_url = https://www.nfm.com/GetPhoto.ashx?ProductID=43382514&Size=L
for i in url_list:
get_data(i)
结果是:
<html>
<body>
<script type="text/javascript">
document.cookie="ns_cls="+"w:"+screen.width+",h:"+screen.height+",ua:"+escape(navigator.userAgent)
window.location.href = "**https://www.nfm.com/DetailsPage.aspx?productid=43382514**"
</script>
</body>
</html>
所以,我正在获取这个HTML页面。包括我通过python脚本调用的URL(urllib2模块)
即使python的响应模块也一样反应!
我不知道如何得到适当的回应!! 请帮助!