Question

我正在使用Python和Selenium来浏览网站。

在页面上，我试图通过一系列5个下拉框来完成工作。每个下拉框中的选项都是根据从上一个下拉列表中选择的内容动态生成的。

我被困在第三个下拉列表中，用户必须选择一个州。加载后，检查的HTML如下所示：

<select name="state" class="pulldown"  id="state" onchange="[javablob]">
<option value="">Select a State</option>
<option value='AK_N'>               AK</option>
<option value='AL_N'>               AL</option>
<option value='AR_Y'>               AR</option>

......等等。

到目前为止我的代码是：

waitforstate = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.ID,"state")))
driver.implicitly_wait(10)  #added because the ID is found but the states aren't loaded yet
state = Select(driver.find_element_by_id('state'))

但是选择我想要的状态是行不通的：

state.select_by_visible_text("TN")

...给出

Message: Given xpath expression ".//option[normalize-space(.) = "TN"]" is invalid: 
WrongDocumentError: Node cannot be used in a document other than the 
one in which it was created

这样做：

state.select_by_value("TN_Y")

...给出

Message: Given css selector expression "option[value ="TN_Y"]" is invalid: 
TypeError: can't access dead object

没有索引可以从中选择状态。

当我尝试显示加载的选项时：

all_options = state.options
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))

...没有打印，甚至没有默认选项。但似乎我可以选择和取消选择默认选项，使用：

state.select_by_visible_text("Select a State")
print "Select a state selected"
state._unsetSelected
print "Now it's unselected"

...运行没有错误。

我使用Firefox的Selenium IDE来浏览页面，看看它是如何处理的，并且能够使用id=state, label=TN.

选择它

我错过了什么？

Answer 1

在抓取javascript呈现的页面时，我发现phantomjs或其他webkit库更有用。能够完全重新创建Web浏览器交互，使其更容易实现刮刀。

我个人喜欢将selenium和phantomjs一起用于刮擦目的。

phantomjs：https://realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/

蟒-QT4：   https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages/

使用Python / Selenium选择棘手的下拉选项

1 个答案: