用于代理IP轮换和用户代理欺骗的代码,以便在抓取时使用。但是由于代码是作为一个例子提供的,所以当我将它添加到我的代码中时,我不知道它是否真的有效。
我是Python的初学者。我只是将它添加到我的.py文件中(在用于抓取的代码之后)。当我添加它并开始抓取它工作并获取所有数据但我不知道它是否正常工作。
代理轮换:
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
proxies = ['121.129.127.209:80', '124.41.215.238:45169', '185.93.3.123:8080', '194.182.64.67:3128', '106.0.38.174:8080', '163.172.175.210:3128', '13.92.196.150:8080']
proxies = get_proxies()
proxy_pool = cycle(proxies)
url = 'https://httpbin.org/ip'
for i in range(1,11):
proxy = next(proxy_pool)
print("Request #%d"%i)
try:
response = requests.get(url,proxies={"http": proxy, "https": proxy})
print(response.json())
except:
print("Skipping. Connnection error")
用户代理欺骗:
import requests
import random
user_agent_list = [
#Chrome
'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
#Firefox
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]
url = 'https://httpbin.org/user-agent'
#Lets make 5 requests and see what user agents are used
#Using Requests
for i in range(1,6):
#Pick a random user agent
user_agent = random.choice(user_agent_list)
#Set the headers
headers = {'User-Agent': user_agent}
#Make the request
response = requests.get(url,headers=headers)
print("Request #%d\nUser-Agent Sent:%s\nUser Agent Recevied by HTTPBin:"%(i,user_agent))
print(response.content)
print("-------------------\n\n")
答案 0 :(得分:0)
如果要检查代理服务器和用户代理是否在旋转,则需要转到请求bin网站,激活一个端点,并在python代码中使用该端点来代替以前的请求。
然后您将检查请求bin,并在执行python代码后阅读列出的“获取请求”的用户代理和IP地址的内容。
答案 1 :(得分:0)
我建议运行大量请求,而不是尝试形象化获取IP的分布。您可以在控制台中使用for循环和后台curl
命令轻松地执行此操作:请参阅
https://weautomate.org/articles/load-testing-ip-rotation-proxy/