因此,一段时间以来,我一直在努力传递自定义HTTP标头。
我正在创建一个脚本(Python),以使用
这样的自定义标头打开URL {'Referer': 'https://google.com', 'X-Forwarded-For': '47.29.76.109',
'User-Agent': 'Mozilla/5.0 (Linux; Android 7.1.1; CPH1723 Build/N6F26Q;
wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/67.0.3396.87
Mobile Safari/537.36', 'existing_proxy_port_to_use': '8090'}
我一直在使用BrowserMob-Proxy,但是当我尝试在Google Chrome浏览器检查中检查“网络”字段时看不到效果。
代码:
def automation():
headers = pd.read_excel('Database/header.xlsx')
for i in range(0,headers.shape[0]):
dict = {}
header = headers.loc[i]
dict['Referer'] = header['Referrer']
dict[header['Option']] = header['IP']
dict['User-Agent'] = header['USERAGENT']
dict['existing_proxy_port_to_use'] = "8090"
print(dict)
URL = 'xyz'
data = pd.read_csv('Database/data.csv')
server = Server(path="./browsermob-proxy/bin/browsermob-proxy", options=dict)
server.start()
proxy = server.create_proxy()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy)) #Configure chrome options
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path='/home/.../chromedriver')
proxy.new_har("google")
for j in range(0,data.shape[0]):
datum = data.loc[j]
print(datum)
driver.get(URL)
driver.quit()
server.stop()
return None
automation()
我正在从头文件中读取这些参数,并使用Selenium填写Google表单。
因此,请帮助我了解如何正确传递标头以及如何确定标头是否正常工作。
答案 0 :(得分:2)
我通过删除Browsermob-proxy代替了使用seleniumwire并使用其driver._client.set_header_overrides(headers=dict_headers)
覆盖默认的HTTP标头,解决了传递标头的问题。
def automation():
headers = pd.read_excel('Database/header.xlsx')
data = pd.read_csv('Database/data.csv')
for i in range(0,headers.shape[0]):
dict_headers = {}
header = headers.loc[i]
dict_headers['Referer'] = header['Referrer']
dict_headers[header['Option']] = header['IP']
dict_headers['User-Agent'] = header['USERAGENT']
URL = 'xyz'
user_agent = "user-agent="+header['USERAGENT']
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(user_agent)
driver = webdriver.Chrome(
chrome_options=chrome_options,
executable_path='/home/.../chromedriver')
driver._client.set_header_overrides(headers=dict_headers)
datum = data.loc[i]
driver.get(URL)
driver.quit()
return None
automation()