我想使用selenium和python来抓取这个网站:https://ntrl.ntis.gov/NTRL
但是,当我想更改下拉列表的年份时,它无法正常工作。
以下是HTML:
const path = require('path');
const webpack = require('webpack');
module.exports = {
entry: ['./app/index.js'],
output: {
path: __dirname + '.build',
filename: 'bundle.js'
},
plugins: [
new webpack.LoaderOptionsPlugin({
options: {
module: {
loaders: [
{
loader: 'babel-loader',
test: /\.jsx?$/,
exclude: /node_modules/,
query: {
presets: ['es2015', 'react']
}
}
]
}
}
})
],
devServer: {
port: 3000,
contentBase: './build',
inline: true
}
}
以下是我的代码:
<div id="advSearchForm:FromYear" class="ui-selectonemenu ui-widget ui-state-default ui-corner-all" style="min-width: 63px;">
<div class="ui-helper-hidden-accessible">
<input id="advSearchForm:FromYear_focus" name="advSearchForm:FromYear_focus" type="text" autocomplete="off" role="combobox" aria-haspopup="true" aria-expanded="false" readonly="readonly" aria-autocomplete="list" aria-owns="advSearchForm:FromYear_items" aria-activedescendant="advSearchForm:FromYear_0" aria-describedby="advSearchForm:FromYear_0" aria-disabled="false">
</div>
<div class="ui-helper-hidden-accessible">
<select id="advSearchForm:FromYear_input" name="advSearchForm:FromYear_input" tabindex="-1">
<option value="*" selected="selected"><1900</option>
<option value="1900">1900</option>
<option value="1901">1901</option>
<option value="1902">1902</option>
<option value="1903">1903</option>
</select>
</div>
<label id="advSearchForm:FromYear_label" class="ui-selectonemenu-label ui-inputfield ui-corner-all"><1900</label>
<div class="ui-selectonemenu-trigger ui-state-default ui-corner-right">
<span class="ui-icon ui-icon-triangle-1-s ui-c"/>
</div>
</div>
但它有例外:
select = Select(driver.find_element_by_xpath(".//div[@id='advSearchForm:FromYear']/div[2]/select"))
select.select_by_value("1902")
我尝试使用js脚本:
Element is not currently visible and may not be manipulated
但它也不起作用,我测试driver.execute_script("document.getElementById('advSearchForm:FromYear_input').options[2].selected = 'true'")
可以在其他下拉列表中使用,所以它可能是select.select_by_value(xxx)
的麻烦,所以我该如何处理呢?
答案 0 :(得分:0)
我建议使用click
事件点击元素(Select
元素,其ID为&#34; advSearchForm:FromYear_input&#34;)首先点击{{3}等待元素可见,然后您应该能够使用select_by_value
方法更改年份。
此外,我会避免使用XPath并改为使用ExplicitWait event,更好的方法是创建CSS selector以减少将来保持工具正常运行所需的工作更新。
很抱歉我无法提供更多帮助,我对python并不熟悉。
您也可以参考Page Object Model
修改强>
看起来它使用option
中的select
项作为主列表,实际选择发生在页面下方的另一个元素内。这个元素是用Javascript动态构建的,所以我在评论中的建议是行不通的。
我已经用C#破解了一个有效的应用程序,让你知道你需要做什么:
private static void Main(string[] args)
{
// ':' has a special meaning in CSS selectors so we need to escape it using \\
const string dropdownButtonSelector = "div#advSearchForm\\:datePublPanel div.ui-selectonemenu-trigger";
// {0} is a placeholder which is used to insert text during runtime
const string dynamicallyBuiltListItemSelectorTemplate = "ul#advSearchForm\\:FromYear_items li[data-label=\"{0}\"]";
// Rather than being a constant this value will be determined at runtime
const string valueToSelect = "1902";
// Setup driver and wait
ChromeDriver driver = new ChromeDriver();
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(5));
// Load page
driver.Navigate().GoToUrl("https://ntrl.ntis.gov/NTRL/");
// Wait until the first (index 0) dropdown list button inside the publication date dive is deemed "clickable"
wait.Until(ExpectedConditions.ElementToBeClickable(driver.FindElementsByCssSelector(dropdownButtonSelector)[0]));
Console.WriteLine("Element is visible");
// Open the dropdown list
driver.FindElementsByCssSelector(dropdownButtonSelector)[0].Click();
Console.WriteLine("Dropdown should be open");
// Select the element from the dynamic Javascript built list
string desiredValueListItemSelector = string.Format(dynamicallyBuiltListItemSelectorTemplate, valueToSelect);
driver.FindElementByCssSelector(desiredValueListItemSelector).Click();
Console.WriteLine($"Selected value {valueToSelect} using selector: {desiredValueListItemSelector}");
Console.ReadLine();
driver.Close();
}
=============================================== ===========================
Edit2
包含python答案,我之前从未编写过python,但这似乎有效。我强烈建议查看上面发布的一些关于使用PageObject模型和显式等待的链接,以及避免使用XPATH选择器。
from selenium import webdriver
from time import sleep
# Set the year to select
fromYearToSelect = "1902"
# Create the driver and load the page
driver = webdriver.Chrome("C:\chromedriver_win32\chromedriver.exe")
driver.get("https://ntrl.ntis.gov/NTRL/")
# Find and click the "From" dropdown elems[1] is the "To" dropdown
elems = driver.find_elements_by_css_selector("div#advSearchForm\\:datePublPanel div.ui-selectonemenu-trigger")
elems[0].click()
# Select the year
driver.find_element_by_css_selector("#advSearchForm\\:FromYear_items li[data-label='{0}']".format(fromYearToSelect)).click()
# Wait to see the results (we should be using an Explicit Wait here)
sleep(2)
# Close the driver
driver.close()