使用Selenium提取自动完成搜索提供的数据

时间:2016-09-30 13:24:37

标签: python selenium selenium-chromedriver

我想提取网站搜索栏自动完成提供的部分结果。我在提取结果时遇到问题。我能够输入我想要的查询,但我无法存储自动提示。似乎每当我点击下拉建议到"检查元素"为了找到选择什么,下拉菜单消失了!

以下是与我合作的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
from scrapy.selector import HtmlXPathSelector

#launch chromedirver
driver.get("http://www.marinetraffic.com/en/ais/index/ports/all")

searchBox = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located(
        (By.XPATH, '//input[@id= "portname"]')
    )
)
searchBox.click()
searchBox.clear()
a = searchBox.send_keys('Belawan') #so far so good

selen_html = driver.find_element_by_class_name('input-group').get_attribute('innerHTML')
hxs = HtmlXPathSelector(text=selen_html)
suggests =  hxs.select('//div[@class= "input-group"/Belawan/@title').extract
driver.close()

错误,毫不奇怪,是ValueError: XPath error: Invalid predicate in //div[@....[etc]。如何在XPath中找到正确的名称?

自动填充的格式为BELAWAN - Port [ID],最终目标是提取ID

编辑: screenshot

1 个答案:

答案 0 :(得分:2)

这应该有效。 基本上你会找到那些web元素的xpath定位器'

在你的情况下就像是

<ul class="ui-autocomplete ui-front ui-menu ui-widget ui-widget-content ui-corner-all" id="ui-id-3" tabindex="0" style="display: none; top: 375px; left: 63px; width: 306px;">
   <li class="ui-menu-item" role="presentation"><a id="ui-id-7" class="ui-corner-all" tabindex="-1"><b>BELA</b>WAN&nbsp;-&nbsp;Port [ID]</a></li>
   <li class="ui-menu-item" role="presentation"><a id="ui-id-8" class="ui-corner-all" tabindex="-1"><b>BELA</b>WAN ANCH&nbsp;-&nbsp;Ancorage [ID]</a></li>
</ul>

所以我使用id来获取另一个ul,然后使用find_elements_by_xpath获取与xpath匹配的childrend列表。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os

#launch chromedirver
driver = webdriver.Chrome()
driver.get("http://www.marinetraffic.com/en/ais/index/ports/all")

searchBox = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located(
        (By.XPATH, '//input[@id= "portname"]')
    )
)
searchBox.click()
searchBox.clear()
a = searchBox.send_keys('Belawan') #so far so good

web_elem_list = driver.find_element_by_id("ui-id-3").find_elements_by_xpath("//li[@role='presentation']/a")
suggests = [web_elem.text for web_elem in web_elem_list]
driver.close()
print suggests


# Will Give o/p
[u'BELAWAN - Port [ID]', u'BELAWAN ANCH - Ancorage [ID]']