Question

我想在从网络上检索数据的同时添加一个引用，但这不适用于我的python2 referer request.add_header('Referer', 'https://www.python.org')。

我的Url.txt内容

https://www.python.org/about/
  https://stackoverflow.com/questions
  https://docs.python.org/2.7/

这些是我的代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import re
import urllib2
import threading
import time
import requests

max_thread = 5
urllist = open("Url.txt").readlines()

def url_connect(url):
    try :
        request = urllib2.Request(url)
        request.add_header('Referer', 'https://www.python.org')
        request.add_header('User-agent', 'Mozilla/5.0')  
        goo = re.findall('<title>(.*?)</title>', urllib2.urlopen(url.replace(' ','')).read())[0]
        print '\n' + goo.decode("utf-8")
        with open('SaveMyDataFile.txt', 'ab') as f:
            f.write(goo + "\n")

    except Exception as Errors:
        pass

for i in urllist:
    i = i.strip()    

    if i.startswith("http"):        

        while threading.activeCount() >= max_thread:
            time.sleep(0.1)

        threading.Thread(target=url_connect, args=(i,)).start()

Answer 1

在我看来问题就在你的urlopen调用中。您可以使用网址调用它，而不是使用请求。

Answer 2

来自https://docs.python.org/2/library/urllib2.html#urllib2.urlopen

打开网址网址，该网址可以是字符串或Request对象。

您需要传递urllib.urlopen()您刚刚构建的Request对象 - 您当前没有做任何事情。

Python 2没有工作的referer

2 个答案: