使用Selenium Webdriver循环访问URL

时间:2017-12-30 06:48:21

标签: python selenium-webdriver

以下request找到当天的比赛ID。我正在尝试将str传递到driver.get url,以便它将进入每个单独的竞赛url并下载每个竞赛CSV。我想你必须写一个loop,但我不确定webdriver会是什么样子。

import time
from selenium import webdriver
import requests
import datetime

req = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA') 
data = req.json()

for ids in data:
    contest = ids['id']

driver = webdriver.Chrome()  # Optional argument, if not specified will search path.
driver.get('https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby');
time.sleep(2) # Let DK Load!

search_box = driver.find_element_by_name('username')
search_box.send_keys('username')
search_box2 = driver.find_element_by_name('password')
search_box2.send_keys('password')
submit_button = driver.find_element_by_xpath('//*[@id="react-mobile-home"]/section/section[2]/div[3]/button/span')
submit_button.click()
time.sleep(2) # Let Page Load, If not it will go to Account!


driver.get('https://www.draftkings.com/contest/exportfullstandingscsv/' + str(contest) + '') 

5 个答案:

答案 0 :(得分:2)

按以下顺序尝试:

import time
from selenium import webdriver
import requests
import datetime

req = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA')
data = req.json()



driver = webdriver.Chrome()  # Optional argument, if not specified will search path.
driver.get('https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby')
time.sleep(2) # Let DK Load!

search_box = driver.find_element_by_name('username')
search_box.send_keys('Pr0c3ss')
search_box2 = driver.find_element_by_name('password')
search_box2.send_keys('generic1!')
submit_button = driver.find_element_by_xpath('//*[@id="react-mobile-home"]/section/section[2]/div[3]/button/span')
submit_button.click()
time.sleep(2) # Let Page Load, If not it will go to Account!

for ids in data:
    contest = ids['id']
    driver.get('https://www.draftkings.com/contest/exportfullstandingscsv/' + str(contest) + '')

答案 1 :(得分:1)

您无需多次发送load selenium即可下载x nos文件。请求和selenium可以共享cookie。这意味着您可以使用selenium登录站点,检索登录详细信息并与请求或任何其他应用程序共享。 请花一点时间查看httpie,https://httpie.org/doc#sessions您似乎手动控制请求之类的会话。

请查看:http://docs.python-requests.org/en/master/user/advanced/?highlight=sessions 对于硒,请查看:http://selenium-python.readthedocs.io/navigating.html#cookies

查看Webdriver块,您可以添加代理并加载浏览器无头或直播:只需注释无头线,它应该实时加载浏览器,这使调试变得简单,易于理解移动和更改到站点api / html

import time
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
import requests
import datetime
import shutil



LOGIN = 'https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby'
BASE_URL = 'https://www.draftkings.com/contest/exportfullstandingscsv/'
USER = ''
PASS = ''

try:
    data = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA').json()
except BaseException as e:
    print(e)
    exit()


ids = [str(item['id']) for item in data]

# Webdriver block
driver = webdriver.Chrome()
options.add_argument('headless')
options.add_argument('window-size=800x600')
# options.add_argument('--proxy-server= IP:PORT')
# options.add_argument('--user-agent=' + USER_AGENT)

try:
    driver.get(URL)
    driver.implicitly_wait(2)
except WebDriverException:
    exit()

def login(USER, PASS)
    '''
    Login to draftkings.
    Retrieve authentication/authorization.

    http://selenium-python.readthedocs.io/waits.html#implicit-waits
    http://selenium-python.readthedocs.io/api.html#module-selenium.common.exceptions

    '''

    search_box = driver.find_element_by_name('username')
    search_box.send_keys(USER)

    search_box2 = driver.find_element_by_name('password')
    search_box2.send_keys(PASS)

    submit_button = driver.find_element_by_xpath('//*[@id="react-mobile-home"]/section/section[2]/div[3]/button/span')
    submit_button.click()

    driver.implicitly_wait(2)

    cookies = driver.get_cookies()
    return cookies


site_cookies = login(USER, PASS)

def get_csv_files(id):
    '''
    get each id and download the file.
    '''

    session = rq.session()

    for cookie in site_cookies:
        session.cookies.update(cookies)

    try:
        _data = session.get(BASE_URL + id)
        with open(id + '.csv', 'wb') as f:
            shutil.copyfileobj(data.raw, f)
    except BaseException:
        return


map(get_csv_files, ids)

答案 2 :(得分:0)

这会有帮助吗

ABC01
ABC02
ABC03 - ABC04
ABC05
ABC06

答案 3 :(得分:0)

可能是时候将它分解一下了 创建一些孤立的函数,它们是:
 0.(可选)提供目标网址的授权。
 1.收集所有必需的id(代码的第一部分)  2.导出特定id的CSV(代码的第二部分)  3.循环遍历id列表并为每个调用函数#2。

chromedriver作为输入参数共享,以保存驱动程序状态和auth-cookies 它工作正常,使代码清晰可读。

答案 4 :(得分:0)

我认为您可以将竞赛的网址设置为目标网页中的if ($('.push-button').css('content') == "\"mobile\"") {元素,然后点击它。然后用其他ID重复该步骤。

请参阅下面的代码。

a