我觉得这对我来说可能只是一个愚蠢的错误,但同时我也不知道。我正在使用python硒将代理添加到新打开的Chrome窗口中。一切运行顺利,直到是时候使用新代理打开窗口了。窗口打开后,其显示为link = 'https://www.zillow.com/homedetails/5958-SW-4th-St-Miami-FL-33144/43835884_zpid/'
content = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(content.text,"lxml")
item = soup.select_one("script#hdpApolloPreloadedData").text
print(json.loads(item)['zpid'])
我确保一旦找到代理,它的括号和引号就会被删除,因此它是纯文本。
我有很多打印声明,所以我可以告诉您什么事情在后台程序中成功发生。我迷路了,我无法弄清楚出了什么问题以及为什么代理无法连接,当我手动进行连接时,它连接得很好,但是通过此操作我得到了一个错误。这是我的代码。
ERR_PROXY_CONNECTION_FAILED
它要提取的代理是import selenium.webdriver as webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
from fake_useragent import UserAgent
import time
import random
from bs4 import BeautifulSoup
chrome_path = r"chromedriver.exe"
times = ""
start_page = ""
options = Options()
proxy = Proxy()
new_proxy = []
pro_sheet = open("proxies.txt", "r")
for line in pro_sheet:
new_proxy.append(line)
def start_pages(target_page):
j = 0
for x in range(0, len(page_number)):
j = j + 1
time.sleep(3)
print("Attempting to add fake userAgent...")
ua = UserAgent()
user_agent = ua.random
print("Successfully added fake userAgent...")
time.sleep(1)
print("Attempting to add chrome arguments...")
time.sleep(1)
options.add_argument(f'user-agent={user_agent}')
options.add_experimental_option("detach", True)
options.add_argument("window-size=600,600")
options.add_argument('--disable-extensions')
options.add_argument('--profile-directory=Default')
options.add_argument("--disable-plugins-discovery")
options.add_argument("--proxy-server=%s" % random.choice(*new_proxy))
options.add_argument("ignore-certificate-errors")
print("Successfully added chrome arguments to browser window...")
time.sleep(1)
# print("Attempting to to initiate headless mode")
# chrome_options.add_argument("--headless")
# print('Initiated headless mode')
print("Attempting to detach chrome...")
options.add_experimental_option("detach", True)
print("Successfully Detached chrome...")
time.sleep(1)
print("Attempting to add proxy...")
time.sleep(1)
try:
proxy.proxy_type = ProxyType.MANUAL
proxy.autodetect = False
proxy.http_proxy = proxy.ssl_proxy = proxy.socks_proxy = random.choice(*new_proxy)
options.proxy = proxy
except Exception as e:
print(e)
time.sleep(1)
print("Successfully added proxy...")
print("Browser number " + str(j) + ", is using proxy: " + str(*new_proxy))
time.sleep(1)
print("Attempting to add capabilities and options...")
capabilities = webdriver.DesiredCapabilities.CHROME
proxy.add_to_capabilities(capabilities)
driver = webdriver.Chrome(chrome_path, options=options)
print("Successfully added capabilities and options...")
time.sleep(1)
driver.set_window_position(0, 0)
driver.get(target_page)
content = driver.get(target_page)
soup = BeautifulSoup(content, 'html.parser')
text = soup.get_text()
if "ERR" in text:
print("FATAL ERROR. Proxy not available.")
driver.quit()
else:
print("Browser number " + str(j) + " has opened successfully...")
while times == "":
times = input("How many pages do you want?\n")
# url = input("Yeezy Supply or Adidas?""\nEither 'YS' or 'Adidas'\n")
# url_choice = url.lower()
page_number = list()
for i in range(0, int(times)):
page_number.append(times)
# if url_choice == 'ys':
# start_page = 'https://yeezysupply.com/'
# start_pages(start_page)
# elif url_choice == 'adidas':
# start_page = 'https://www.adidas.com/yeezy'
# start_pages(start_page)
start_page: str = 'https://www.adidas.com/yeezy'
start_pages(start_page)
我认为它没有任何问题,因为在我对其进行测试时,它可以很好地工作。
注意::我知道漂亮的汤会产生错误,但我仍然没有仔细阅读整个文档。