Question

我想从网站获取一些数据，例如：图片网址，网页标题等。

但反应并不好。

代码：

import urllib2
from bs4 import BeautifulSoup

url_list = [
    "https://www.nfm.com/DetailsPage.aspx?productid=43382514"
]

# Image URLhttps://www.nfm.com/GetPhoto.ashx?ProductID=43382514&Size=L


def get_data(url):
    user_agent = '"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"'
    headers = {'User-Agent': user_agent}
    page = urllib2.Request(url, None, headers)
    page2 = urllib2.urlopen(page)
    soup = BeautifulSoup(page2, 'html.parser')
    print soup.prettify('latin-1')
    # img_url = https://www.nfm.com/GetPhoto.ashx?ProductID=43382514&Size=L

for i in url_list:
    get_data(i)

结果是：

<html>
 <body>
  <script type="text/javascript">
   document.cookie="ns_cls="+"w:"+screen.width+",h:"+screen.height+",ua:"+escape(navigator.userAgent)
window.location.href = "**https://www.nfm.com/DetailsPage.aspx?productid=43382514**"
  </script>
 </body>
</html>

所以，我正在获取这个HTML页面。包括我通过python脚本调用的URL（urllib2模块）

即使python的响应模块也一样反应！

我不知道如何得到适当的回应!! 请帮助！

python urllib2，没有正确答案

0 个答案: