我想从python使用tor来自动化请求。 我用一个页面进行了测试以检查IP并且它有效。
然后我指向了我想要的网站,显然它们避免了一个tor端点,因为(参见下面的堆栈跟踪) - 但它可以从tor浏览器中运行。
调试浏览器响应的更好方法是什么? (例如拒绝连接)
我缺少哪些东西来从python而不是浏览器查询?
我正在尝试使用类似的东西:
#! /usr/bin/python3
import numpy as np
np.set_printoptions(formatter={'int':hex})
# I assume, a has the len 2*n and b has the len n (for conversion from 2*n Bytes in n 2Bytes)
a = np.array([0x31, 0x41, 0x59, 0x26, 0x53, 0x58, 0x97, 0x93]).astype(np.uint8)
b = np.array([(a[2*i]<<8)+a[2*i+1] for i in range(0, len(a) // 2)]).astype(np.uint16)
print("a: {}".format(a)) # a: [0x31 0x41 0x59 0x26 0x53 0x58 0x97 0x93]
print("b: {}".format(b)) # b: [0x3141 0x5926 0x5358 0x9793]
答案 0 :(得分:1)
即使您使用了某些网站,也正在寻找用户代理。尝试让用户代理标头出现在您的请求中。我遇到了同样的问题。
这对我有用
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import random
from fake_useragent import UserAgent
from torrequest import TorRequest
import time, socks, socket
from stem import Signal
from stem.control import Controller
ua = UserAgent()
with Controller.from_port(port = 9051) as controller:
controller.authenticate(password = 'YourPasswordHere')
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
controller.signal(Signal.NEWNYM)
if controller.is_newnym_available() == False:
print("Waitting time for Tor to change IP: "+ str(controller.get_newnym_wait()) +" seconds")
time.sleep(controller.get_newnym_wait())
req = Request(url)
req.add_header('User-Agent', ua.random)
req_doc= urlopen(req)#.read().decode('utf8')
print(req_doc)