Python刮痧TOR,脚本“To Russia,with love”

时间:2017-03-18 12:34:57

标签: python web-scraping tor bs4

我正在尝试通过TOR使用BS4,使用Stem项目中的To Russia With Love教程。

我使用i.a重写了一下代码。 this answer,现在看起来像这样,

SOCKS_PORT=7000

def query(url):

output = io.BytesIO()

query = pycurl.Curl()
query.setopt(pycurl.URL, url)
query.setopt(pycurl.PROXY, 'localhost')
query.setopt(pycurl.PROXYPORT, SOCKS_PORT)
query.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5_HOSTNAME)
query.setopt(pycurl.WRITEFUNCTION, output.write)

try:
    query.perform()
    return output.getvalue()
except pycurl.error as exc:
    return "Unable to reach %s (%s)" % (url, exc)

def print_bootstrap_lines(line):
    if "Bootstrapped " in line:
       print(term.format(line, term.Color.BLUE))

print(term.format("Starting Tor:\n", term.Attr.BOLD))

tor_process = stem.process.launch_tor_with_config(
   tor_cmd = '/Applications/TorBrowser.app/Contents/MacOS/Tor/tor.real',
   config = {
      'SocksPort': str(SOCKS_PORT),
      'ExitNodes': '{ru}',
      'GeoIPFile': r'/Applications/TorBrowser.app/Contents/Resources/TorBrowser/Tor/geoip',
      'GeoIPv6File' : r'/Applications/TorBrowser.app/Contents/Resources/TorBrowser/Tor/geoip6'
},
       init_msg_handler = print_bootstrap_lines,
)

print(term.format("\nChecking our endpoint:\n", term.Attr.BOLD))
print(term.format(query("https://www.atagar.com/echo.php"), term.Color.BLUE))

我能够建立Tor电路,但在“检查我们的端点”时,我收到以下错误,

Checking our endpoint:

Traceback (most recent call last):

File "<ipython-input-804-68f8df2c050b>", line 40, in <module>
print(term.format(query('https://www.atagar.com/echo.php'), term.Color.BLUE))

File "/Applications/anaconda/lib/python3.6/site-packages/stem/util/term.py", line 139, in format
if RESET in msg:

TypeError: a bytes-like object is required, not 'str'

我应该更改什么来查看端点?

我通过使用

更改上面代码的最后一行暂时解决了这个问题
test=requests.get('https://www.atagar.com/echo.php')
soup = BeautifulSoup(test.content, 'html.parser')
print(soup)

但我想知道如何使'原始'线工作。

1 个答案:

答案 0 :(得分:0)

当代码是为Python 2制作时,你必须使用Python 3.在Python 2中,strbytes是相同的,在Python 3中,str是Python 2的unicode。您必须在字符串之前直接添加b,以使其成为Python 3中的字节字符串,例如:

b"this is a byte string"