问题"选择"函数来抓取python Selenium中的相对下拉菜单

时间:2017-01-18 15:26:17

标签: python selenium select web-crawler

我在Selenium Python中使用Select函数来操作此页面上的下拉菜单的父子关系(http://www.bobaedream.co.kr/cyber/CyberCar.php?gubun=I)。第一行有五个下拉菜单。我完成了操作直到第三个下拉菜单的代码,并在第四个下拉菜单中读取了整个项目。但问题始于选择第四列中的一个项目并在第五列中调用以下项目。

当我尝试打印所选项目时,它会返回第四列中的所有项目,包括其标题,'등급',以及仅第五个下拉菜单的标题,&#39 ;세부등급'

以下是我的样本结果:

 등급1.8d xDrive2.0d xDrive2.3d xDrive2.5d xDrive2.8i xDrive
====level1

                          세부등급
====level2
X5 (99~13년) 

以下是我想要的结果:

BMW - 5시리즈 - 뉴 5시리즈 (10년~현재) - 520d - F10 (10년~현재) / 럭셔리 F10 (14년~현재) / 럭셔리 플러스 F10 (15년 현재) / M 스포츠 F10 (13년) / M 에어로다이나믹 F10 (16년현재)

第四和第五个下拉菜单的具体部分是:

520d - F10 (10년~현재) / 럭셔리 F10 (14년~현재) / 럭셔리 플러스 F10 (15년 현재) / M 스포츠 F10 (13년) / M 에어로다이나믹 F10 (16년현재) 

以下是此部分的代码:

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import re

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

from bs4 import BeautifulSoup
from time import sleep

link = 'http://www.bobaedream.co.kr/cyber/CyberCar.php?gubun=I'
driver = webdriver.PhantomJS()
driver.set_window_size(1920, 1080)
driver.get(link)
sleep(.75)

.....

#level1 means '등급'
#level2 means '세부등급'
level1_selects = driver.find_elements_by_css_selector("#level_no")
for level1_select in level1_selects:

if level1_select.text == '등급':
    continue
else:
    print(level1_select.text)
    select_obj = Select(level1_select)
    select_obj.select_by_value(level1_select.get_attribute('value'))
    print('====level1')
    sleep(0.75)

    level2_selects = driver.find_elements_by_css_selector("#level2_no")

    for level2_select in level2_selects:
        print(level2_select.text)
        select_obj = Select(level2_select)
        select_obj.select_by_value(level2_select.get_attribute('value'))
        print('====level2')

0 个答案:

没有答案