为每次获取更改用户代理字符串

时间:2018-07-11 03:15:17

标签: python python-3.x selenium scrapy

我正在使用以下代码来更改用户代理字符串,但是我想知道这是否会更改每个browser.get请求的用户代理字符串吗?

ua_strings = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.1 Safari/605.1.15',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
    ...
]

def parse(self, response):
    profile = webdriver.FirefoxProfile()
    profile.set_preference('general.useragent.override', random.choice(ua_string))
    options = Options()
    options.add_argument('-headless')
    browser = webdriver.Firefox(profile, firefox_options=options)
    browser.get(self.start_urls[0])

    hrefs = WebDriverWait(browser, 60).until(
        EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="discoverableCard"]/a'))
    )

    pages = []

    for href in hrefs:
        pages.append(href.get_attribute('href'))

    for page in pages:
        browser.get(page)

        """ scrape page """

    browser.close()

还是我必须browser.close()然后创建browser的新实例才能为每个请求使用新的用户代理字符串?

    for page in pages:
        browser = webdriver.Firefox(profile, firefox_options=options)
        browser.get(page)

        """ scrape page """

        browser.close()

1 个答案:

答案 0 :(得分:1)

由于最初已调用random.choice(),所以用户代理字符串在所有browser.get()请求中均相同。为了确保用户代理不断随机,您可以创建一个set_preference()函数,在每个循环中调用该函数。

def set_prefrences(self):
    user_agent_string = random.choice(ua_string)

    #print out user-agent on each loop
    print(user_agent_string)
    profile = webdriver.FirefoxProfile()
    profile.set_preference('general.useragent.override', user_agent_string)
    options = Options()
    options.add_argument('-headless')
    browser = webdriver.Firefox(profile, firefox_options=options)
    return browser

然后循环中的内容可能是这样的:

for page in pages:
    browser = set_preferences()
    browser.get(page)

    """ scrape page """

    browser.close()

希望这会有所帮助!