Question

请帮我解决这个问题！

目前，我正在使用C＃中的Selenium Firefox驱动程序抓取一个网站。但是，本网站上的数据是动态填写的，用于涵盖有关未来日期的数据的表格。

虽然表的结构对于将来和过去的日期都是完全相同的，但是在我的selenium调用期间正在更新的表会抛出一个＆＃34; NoSuchElementException＆＃34;关于那些明显存在的IWebElements。

这些是表中相关的复制XPath。一个过去的日期，它完全正常，一个在未来的日期，抛出异常。如您所见，它们完全相同。

XPath 18052015

/ HTML /体/格[1] / DIV / DIV [2] / DIV [5] / DIV [1] / DIV / DIV [1] / DIV [2] / DIV [1] /格[7 ] / DIV [1] /表/ TBODY /的 TR [1] / TD [1] / DIV / A [2]

XPath 05022016

/ HTML /体/格[1] / DIV / DIV [2] / DIV [5] / DIV [1] / DIV / DIV [1] / DIV [2] / DIV [1] /格[7 ] / DIV [1] /表/ TBODY /的 TR [1] / TD [1] / DIV / A [2]

使用FindElements（By.XPath（...））函数，我使用两个foreach循环遍历突出显示的tr和Xpath中的td以获取a中的一些文本[2]标题。在这两种情况下，FireFox Firebug中的DOM似乎在两种情况下都是相同的。我在两个表之间观察到的唯一区别是，每隔几秒，关于未来日期的一个更新其值（同时通过firebug查看时重置表）。在这里，你有一段相关的代码，并有一个重要的评论。

            foreach (var tr in table.FindElements(By.XPath("div/table/tbody/tr")))
            {
                foreach (var td in tr.FindElements(By.XPath("td")))
                {
                    if(td.GetAttribute("innerHTML").Contains("some stuff"))
                    {
                        // This part is always reached, so condition is satisfied. > x is the relevant value, it is assigned the proper value when the error is thrown, but it still throws an exception.
                        x = td.FindElement(By.XPath("div/a[2]")).GetAttribute("href").Split('/')[4];
                        bmID = getBookmakerID(bmName);
                    }
                    if(td.GetAttribute("class").Contains("some other stuff"))
                    {

                    }
                }

你们之前是否有过类似的问题并且能够解决它们？

Answer 1

您可以将Wait添加到您调用FindElement的每个步骤吗？见下面的例子：

IWait<IWebElement> wait = new DefaultWait<IWebElement>(table);
wait.Timeout = TimeSpan.FromSeconds(5);
wait.PollingInterval = TimeSpan.FromMilliseconds(300);
By locator = By.XPath("div/table/tbody/tr");
ReadOnlyCollection<IWebElement> rows;

wait.Until(e => e.FindElements(locator).Count > 0);
rows = table.FindElements(locator);


foreach (var tr in rows)
{

    wait = new DefaultWait<IWebElement>(tr);
    wait.Timeout = TimeSpan.FromSeconds(5);
    wait.PollingInterval = TimeSpan.FromMilliseconds(300);
    locator = By.XPath("td");
    ReadOnlyCollection<IWebElement> cells;

    wait.Until(e => e.FindElements(locator).Count > 0);
    cells = tr.FindElements(locator);

    foreach (var td in cells)
    {
        if (td.GetAttribute("innerHTML").Contains("some stuff"))
        {
            // This part is always reached, so condition is satisfied. > x is the relevant value, it is assigned the proper value when the error is thrown, but it still throws an exception.
            wait = new DefaultWait<IWebElement>(td);
            wait.Timeout = TimeSpan.FromSeconds(5);
            wait.PollingInterval = TimeSpan.FromMilliseconds(300);
            locator = By.XPath("div/a[2]");
            IWebElement link2;

            wait.Until(e => e.FindElements(locator).Count > 0);
            try
            {
                link2 = td.FindElement(locator);
            }
            catch (NoSuchElementException ex)
            {
                throw new NoSuchElementException("Unable to find element, locator: \"" + locator.ToString() + "\".");
            }
            x = link2.GetAttribute("href").Split('/')[4];
            bmID = getBookmakerID(bmName);
        }
        if (td.GetAttribute("class").Contains("some other stuff"))
        {

        }
    }
}

如果仍然是错误，您可以轻松地在Visual Studio中调试测试。

Answer 2

非常感谢您的帮助。 @ Buaban - 我已经添加了等待，但我担心这并没有太大变化。它确实使算法更进一步，但最终它崩溃了。

最后，我们使用Selenium webdriver和HTMLAgilityPack的组合解决了这个问题。由于代码太具体而无法实际发布（目前我还没有它），我将与您分享主要的哲学......这很简短：

使用Selenium Webdriver打开并浏览浏览器，例如正在采取行动

转到正确的网址
打开下拉菜单/表格
登录/点击网站
定义从中将数据作为网络元素（WE）翻录的表/字段
在此处查看教程：http://toolsqa.com/selenium-webdriver-tutorials-in-c-selenium-tutorial-in-c/

使用HTMLAgilityPack浏览和翻录已定义的网络元素（WE）

加载WE的InnerHTML属性以便在HTML AP中进行处理
您可以在不同的div / trs / tds
它很多 - 而且我真的意味着很多 - 比使用Selenium Webdriver更快，因为它将HTML解析为字符串
在此处查看精彩教程：http://www.mikesdotnetting.com/article/273/using-the-htmlagilitypack-to-parse-html-in-asp-net，特别查看“Where＆lt;＆gt;”功能！

总而言之，这种处理自刷新页面的方法已被证明非常稳定（到目前为止它没有失败过一次），非常快（由于将HTML解析为字符串）和灵活（因为它使用了特殊的包从浏览器导航和翻录数据。）

快乐的编码！

带动态表的Selenium.NoSuchElementException

2 个答案: