Question

我正在使用selenium RC，我希望，例如，获取属性href匹配的所有链接元素：

http://[^/]*\d+com

我想用：

sel.get_attribute( '//a[regx:match(@href, "http://[^/]*\d+.com")]/@name' )

将返回与正则表达式匹配的所有链接的name属性列表。（或类似的东西）

感谢

Answer 1

上面的答案可能是查找与正则表达式匹配的所有链接的正确方法，但我认为回答问题的其他部分，如何在Xpath定位器中使用正则表达式也是有帮助的。你需要使用regex matches（）函数，如下所示：

xpath=//div[matches(@id,'che.*boxes')]

（当然，这会点击带有'id =复选框'或'id = cheanyTHINGHEREboxes'的div）

请注意，Xpath的所有本机浏览器实现都不支持matches功能（最明显的是，在FF3中使用它会引发错误：无效的xpath [2]）。

如果您的特定浏览器出现问题（就像我使用FF3一样），请尝试使用Selenium的allowNativeXpath（“false”）切换到JavaScript Xpath解释器。它会慢一些，但似乎可以使用更多的Xpath函数，包括'matches'和'ends-with'。：）

Answer 2

您可以使用Selenium命令getAllLinks获取页面上链接ID的数组，然后您可以循环访问并使用getAttribute检查href，其中定位符后跟@和属性名称。例如，在Java中，这可能是：

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

Answer 3

一种可能的解决方案是使用sel.get_eval()并编写一个返回链接列表的JS脚本。类似于以下答案： selenium: Is it possible to use the regexp in selenium locators

Answer 4

以下是Selenium RC的一些替代方法。这些不是纯粹的Selenium解决方案，它们允许与您的编程语言数据结构和Selenium进行交互。

您还可以获取HTML页面源，然后正则表达源以返回匹配的链接集。使用正则表达式分组来分隔URL，链接文本/ ID等，然后您可以将它们传递回selenium以单击或导航到。

另一种方法是获取父/根元素的HTML页面源或innerHTML（通过DOM定位器），然后将HTML转换为XML作为编程语言中的DOM对象。然后，您可以使用所需的XPath遍历DOM（使用正则表达式），并获取仅有感兴趣的链接的节点集。从他们的解析链接文本/ ID或URL，你可以传回selenium点击或导航到。

根据要求，我在下面提供示例。这是混合语言，因为该帖子似乎不是语言特定的。我只是用我可用的东西来一起破解示例。它们根本没有经过全面测试或测试，但我之前在其他项目中使用过一些代码，因此这些是您如何实现我刚才提到的解决方案的概念代码示例的证明。

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

Answer 5

Selenium的By.Id和By.CssSelector方法不支持Regex，而By.XPath仅在启用XPath 2.0的情况下才支持。如果要使用正则表达式，可以执行以下操作：

void MyCallingMethod(IWebDriver driver)
{
    //Search by ID:
    string attrName = "id";
    //Regex = 'a number that is 1-10 digits long'
    string attrRegex= "[0-9]{1,10}";
    SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{    
     List<IWebElement> elements = new List<IWebElement>();

     //Allows spaces around equal sign. Ex: id = 55
     string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
     //Search page source
     MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
    //iterate over matches
    foreach (Match match in matches)
    {
        //Get exact attribute value
        Match innerMatch = Regex.Match(match.Value, attrRegex);
        cssSelector = "[" + attrName + "=" + attrRegex + "]";
       //Find element by exact attribute value
       elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
   }

   return elements;
}

注意：此代码未经测试。另外，您可以通过找出消除第二次搜索的方法来优化此方法。

如何在selenium定位器中使用正则表达式

5 个答案: