Question

所以基本上我需要从UCLA书店网站获取部门，课程和部分的所有组合：http://shop.uclastore.com/courselistbuilder.aspx然后我需要选择书籍并解析生成的html页面。

而不是手动完成（这需要永远），我正在寻找其他选项来以编程方式进行。我找到的一个选项是Selenium WebDriver，它处理浏览器自动化。从SO中查看示例我发现Selenium WebDriver是一个很有前途的功能，但我不太确定它是否能够做到我需要它。

或多或少的伪码格式，这是我如何使用selenium web driver的方法

go to the site: http://shop.uclastore.com/courselistbuilder.aspx

for each_department in department:
    click on each_department
    for each_course in course:
        click on each_course
            for each_section in section:
                click on each_section
// After every department, course, and section has been chosen, we click choose books
click on choose books link

// Save the resulting html file
save next page as html file

我想知道我是否能够使用Selenium WebDriver做我想做的事。如果有人可以提供更好的伪代码与Selenium WebDriver更合适，那将会很有帮助，但我主要关注的是这个功能是否可行。我还想提一下，我计划在使用Selenium时使用Python API。

Answer 1

这是你应该开始的（下面的解释）：

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select


url = "http://shop.uclastore.com/courselistbuilder.aspx"

driver = webdriver.Firefox()
driver.get(url)
time.sleep(1)

departments = Select(driver.find_element_by_id('clDeptSelectBox'))
for department in departments.options:
    # select department
    departments.select_by_value(department.get_attribute('value'))
    time.sleep(1)

    cources = Select(driver.find_element_by_id('clCourseSelectBox'))
    for cource in cources.options:
        # select course
        cources.select_by_value(cource.get_attribute('value'))
        time.sleep(1)

        sections = Select(driver.find_element_by_id('clSectionSelectBox'))
        for section in sections.options:
            print {'department': department.text,
                   'course': cource.text,
                   'section': section.text}

driver.close()

打印：

{'department': u'AFRCST - AFRICAN STUDIES', 'course': u'201A', 'section': u'1 - LYDON'}
{'department': u'AFRKLA - AFRICAN LANGUAGES', 'course': u'1A', 'section': u'1 - TA'}
{'department': u'AFRKLA - AFRICAN LANGUAGES', 'course': u'150A', 'section': u'1 - FANTA, A.A.'}
{'department': u'AFROAM - AFRO-AMERICAN STUDIES', 'course': u'6', 'section': u'1 - STREETER, C.A.'}
{'department': u'AFROAM - AFRO-AMERICAN STUDIES', 'course': u'M10A', 'section': u'1 - LYDON, G.E.'}
{'department': u'AFROAM - AFRO-AMERICAN STUDIES', 'course': u'M102', 'section': u'1 - LEWIS, L.I.'}
{'department': u'AFROAM - AFRO-AMERICAN STUDIES', 'course': u'M103A', 'section': u'1 - PRICE, Z.F.'}
{'department': u'AFROAM - AFRO-AMERICAN STUDIES', 'course': u'M104A', 'section': u'1 - YARBOROUGH'}
...

我们的想法是广泛使用Select class，它提供了一个很好的API select/options功能。首先，我们获取所有部门，然后迭代选项并选择循环中的下一个部门。然后，在小延迟之后，我们以相同的方式获取courses和sections的列表。

我已经离开你了解Waits（time.sleep()真的不太可靠）并点击Choose Books按钮（好吧，德国人彼得罗夫为你提供了两个）。

希望有所帮助。

Answer 2

您可以使用〜这样的代码单击所有选项：

select_dept = driver.find_element_by_id('clDeptSelectBox')
for option_dept in select_dept.find_elements_by_tag_name('option'):
    option_dept.click()
    #wait until course has options
    wait.until(lambda driver: driver.find_element_by_xpath("//select[@id='clCourseSelectBox']//option"))
    select_course = driver.find_element_by_id('clCourseSelectBox')
    for option_course in select_course.find_elements_by_tag_name('option'):
         option_course.click()
         #wait until section has options
         ....

和部分选择相同，然后等待链接显示并单击：

wait.until(lambda driver: driver.find_element_by_xpath("//a[contains(., 'Choose Books')]"))
driver.find_element_by_xpath("//a[contains(., 'Choose Books')]").click()

Selenium WebDriver可以从下拉菜单中单击所有组合

2 个答案: