Question

因此，我正在尝试通过python请求访问此网站（https://www.nakedcph.com/），但是很遗憾，它不会加载而不会出现问题。我知道它们受到cloudflare的保护，但是当我通过浏览器访问时，我不会遇到任何问题。当我向首页发出GET请求时，它会检测到我正在使用request / bot并提示我解决验证码。我该怎么做才能避免这种情况？我知道这不是常见的“您必须启用了javascript”消息/错误，因为我之前已经访问过此站点并且我正在使用cloudscrape。我不认为这是我的标题，因为我直接从浏览器中复制了它们。我确实意识到我的家庭IP已被标记（这会导致提示验证码），但是我尝试在服务器和/上使用代理运行脚本，但仍然收到相同的错误，因此我认为我的请求有问题，而不是我的IP或标头。有人有建议吗？谢谢你！

我已经尝试过：使用干净的代理，该代理在浏览器上可以完美访问网站（无需询问验证码），更新标题和解决验证码。

编辑：我也尝试过从浏览器中获取cookie并在脚本中使用它，但是仍然提示我解决验证码。

import cfscrape
import requests

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}
scraper = cfscrape.create_scraper()  # returns a CloudflareScraper instance
s = requests.Session()

# Load proxy from file
file = open("proxies.txt", "r")
proxies = file.readlines()
file.close
line = proxies[random.randint(0, len(proxies) - 1)].strip("\n").split(":")


if len(line) == 2: #if proxy length is ==2, its an IP Auth proxy
    # print("Proxy is detected as IP Auth")
    line = proxies[random.randint(0, len(proxies) - 1)].strip("\n")
    proxy= {
        'http':line,
        'https':line,
        }

else:#if proxy length is anything else, its an USER:PASS
    #  print("Proxy is detected as USER:PASS")
    proxy = {'http': 'http://' + line[2] + ":" + line[3] + "@" + line[0] + ":" + line[1] + "/",
                        'https': 'https://' + line[2] + ":" + line[3] + "@" + line[0] + ":" + line[1] + "/"}

test = scraper.get("https://www.nakedcph.com/", headers=headers, proxies=proxy)
time.sleep(5)
print(test.text)

当我打印（test.text）时，我应该能够看到它们的索引HTML，该HTML大于57行。如果len（test.text）== 57，则表示正在提示输入验证码。

为什么网站会在浏览器上加载，但无法通过python请求加载？（JavaScript没问题）

0 个答案:

为什么网站会在浏览器上加载，但无法通过python请求加载？ （JavaScript没问题）

0 个答案:

为什么网站会在浏览器上加载，但无法通过python请求加载？（JavaScript没问题）