Question

我正在尝试使用selenium和python实现2captcha。

我只是从其文档中复制了示例：
https://github.com/2captcha/2captcha-api-examples/blob/master/ReCaptcha%20v2%20API%20Examples/Python%20Example/2captcha_python_api_example.py

这是我的代码：

from selenium import webdriver
from time import sleep
from selenium.webdriver.support.select import Select
import requests

driver = webdriver.Chrome('chromedriver.exe')
driver.get('the_url')

current_url = driver.current_url



captcha = driver.find_element_by_id("captcha-box")
captcha2 = captcha.find_element_by_xpath("//div/div/iframe").get_attribute("src")
captcha3 = captcha2.split('=')
#print(captcha3[2])

# Add these values
API_KEY = 'my_api_key'  # Your 2captcha API KEY
site_key = captcha3[2]  # site-key, read the 2captcha docs on how to get this
url = current_url  # example url
proxy = 'Myproxy'  # example proxy

proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}

s = requests.Session()

# here we post site key to 2captcha to get captcha ID (and we parse it here too)
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url), proxies=proxy).text.split('|')[1]
# then we parse gresponse from 2captcha response
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
print("solving ref captcha...")
while 'CAPCHA_NOT_READY' in recaptcha_answer:
    sleep(5)
    recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
recaptcha_answer = recaptcha_answer.split('|')[1]

# we make the payload for the post data here, use something like mitmproxy or fiddler to see what is needed
payload = {
    'key': 'value',
    'gresponse': recaptcha_answer  # This is the response from 2captcha, which is needed for the post request to go through.
    }


# then send the post request to the url
response = s.post(url, payload, proxies=proxy)

# And that's all there is to it other than scraping data from the website, which is dynamic for every website.

这是我的错误：

解决ref验证码...
  追溯（最近一次通话）：

中的文件“ main.py”，第38行       recaptcha_answer = recaptcha_answer.split（'|'）[1]
  IndexError：列表索引超出范围

验证码已得到解决，因为我可以在2captcha仪表板上看到它，所以如果这是官方文档，这是错误吗？

编辑： 对于一些没有修改的人，我会从形式2验证码中获得验证码，但随后出现此错误：

solving ref captcha...
OK|this_is_the_2captch_answer
Traceback (most recent call last):
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 594, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 805, in _prepare_proxy
    conn.connect()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 308, in connect
    self._tunnel()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 906, in _tunnel
    (version, code, message) = response._read_status()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 278, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: <html>


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\packages\six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 594, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 805, in _prepare_proxy
    conn.connect()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 308, in connect
    self._tunnel()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 906, in _tunnel
    (version, code, message) = response._read_status()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 278, in _read_status
    raise BadStatusLine(line)
urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('<html>\r\n'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    response = s.post(url, payload, proxies=proxy)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine('<html>\r\n'))

为什么我会收到此错误？

我设置为site_key = current_url_where_captcha_is_located

这是正确的吗？

Answer 1

您似乎没有提供任何有效的代理连接参数，而是在连接到API时将此代理传递给了requests。

只需评论这两行：

#proxy = 'Myproxy'  # example proxy
#proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}

然后从四行中删除proxies=proxy：

captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url)).text.split('|')[1]
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
response = s.post(url, payload, proxies=proxy)

Answer 2

使用调试器或将print(recaptcha_answer)放在错误行之前，以查看recaptcha_answer的值，然后再尝试在其上调用.split('|')。字符串中没有|，因此当您尝试使用[1]获取结果列表的第二个元素时，它会失败。

2验证码硒超出范围

2 个答案: