XPATH查询中的特殊字符

时间:2009-08-27 15:31:12

标签: c# java xml xpath

我使用以下XPATH Query列出网站下的对象。 ListObject[@Title='SomeValue']。 SomeValue是动态的。只要SomeValue没有撇号('),此查询就会起作用。尝试使用转义序列。没工作。

我做错了什么?

10 个答案:

答案 0 :(得分:57)

这很难做到。

看看XPath Recommendation,您会看到它将文字定义为:

Literal ::=   '"' [^"]* '"' 
            | "'" [^']* "'"

也就是说,XPath表达式中的字符串文字可以包含撇号或双引号,但不能同时包含两者。

你无法使用转义来解决这个问题。像这样的文字:

'Some'Value'

将匹配此XML文本:

Some'Value

这意味着有可能存在一段XML文本无法生成匹配的XPath文字,例如:

<elm att="&quot;&apos"/>

但这并不意味着将该文本与XPath匹配是不可能的,这只是棘手的。在您尝试匹配的值包含单引号和双引号的任何情况下,您都可以构造一个表达式,使用concat来生成它将匹配的文本:

elm[@att=concat('"', "'")]

因此,这引出了我们,这比我想要的要复杂得多:

/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
/// 
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value.  If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
static string XPathLiteral(string value)
{
    // if the value contains only single or double quotes, construct
    // an XPath literal
    if (!value.Contains("\""))
    {
        return "\"" + value + "\"";
    }
    if (!value.Contains("'"))
    {
        return "'" + value + "'";
    }

    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("foo", '"', "bar")
    StringBuilder sb = new StringBuilder();
    sb.Append("concat(");
    string[] substrings = value.Split('\"');
    for (int i = 0; i < substrings.Length; i++ )
    {
        bool needComma = (i>0);
        if (substrings[i] != "")
        {
            if (i > 0)
            {
                sb.Append(", ");
            }
            sb.Append("\"");
            sb.Append(substrings[i]);
            sb.Append("\"");
            needComma = true;
        }
        if (i < substrings.Length - 1)
        {
            if (needComma)
            {
                sb.Append(", ");                    
            }
            sb.Append("'\"'");
        }

    }
    sb.Append(")");
    return sb.ToString();
}

是的,我用所有边缘情况测试了它。这就是逻辑如此愚蠢的原因:

    foreach (string s in new[]
    {
        "foo",              // no quotes
        "\"foo",            // double quotes only
        "'foo",             // single quotes only
        "'foo\"bar",        // both; double quotes in mid-string
        "'foo\"bar\"baz",   // multiple double quotes in mid-string
        "'foo\"",           // string ends with double quotes
        "'foo\"\"",         // string ends with run of double quotes
        "\"'foo",           // string begins with double quotes
        "\"\"'foo",         // string begins with run of double quotes
        "'foo\"\"bar"       // run of double quotes in mid-string
    })
    {
        Console.Write(s);
        Console.Write(" = ");
        Console.WriteLine(XPathLiteral(s));
        XmlElement elm = d.CreateElement("test");
        d.DocumentElement.AppendChild(elm);
        elm.SetAttribute("value", s);

        string xpath = "/root/test[@value = " + XPathLiteral(s) + "]";
        if (d.SelectSingleNode(xpath) == elm)
        {
            Console.WriteLine("OK");
        }
        else
        {
            Console.WriteLine("Should have found a match for {0}, and didn't.", s);
        }
    }
    Console.ReadKey();
}

答案 1 :(得分:6)

编辑:经过繁重的单元测试会议并检查XPath Standards后,我修改了我的功能如下:

public static string ToXPath(string value) {

    const string apostrophe = "'";
    const string quote = "\"";

    if(value.Contains(quote)) {
        if(value.Contains(apostrophe)) {
            throw new XPathException("Illegal XPath string literal.");
        } else {
            return apostrophe + value + apostrophe;
        }
    } else {
        return quote + value + quote;
    }
}

似乎XPath根本没有一个字符转义系统,它真的非常原始。显然我的原始代码只是巧合。我为误导任何人而道歉!

以下原始回答仅供参考 - 请忽略

为安全起见,请确保XPath字符串中所有5个预定义XML实体的出现都被转义,例如。

public static string ToXPath(string value) {
    return "'" + XmlEncode(value) + "'";
}

public static string XmlEncode(string value) {
    StringBuilder text = new StringBuilder(value);
    text.Replace("&", "&amp;");
    text.Replace("'", "&apos;");
    text.Replace(@"""", "&quot;");
    text.Replace("<", "&lt;");
    text.Replace(">", "&gt;");
    return text.ToString();
}

我以前做过这件事并且工作正常。如果它不适合您,可能需要让我们了解问题的其他背景。

答案 2 :(得分:5)

我移植了罗伯特对Java的回答(在1.6中测试):

/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value.  If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
public static String XPathLiteral(String value) {
    if(!value.contains("\"") && !value.contains("'")) {
        return "'" + value + "'";
    }
    // if the value contains only single or double quotes, construct
    // an XPath literal
    if (!value.contains("\"")) {
        System.out.println("Doesn't contain Quotes");
        String s = "\"" + value + "\"";
        System.out.println(s);
        return s;
    }
    if (!value.contains("'")) {
        System.out.println("Doesn't contain apostophes");
        String s =  "'" + value + "'";
        System.out.println(s);
        return s;
    }

    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("foo", '"', "bar")
    StringBuilder sb = new StringBuilder();
    sb.append("concat(");
    String[] substrings = value.split("\"");
    for (int i = 0; i < substrings.length; i++) {
        boolean needComma = (i > 0);
        if (!substrings[i].equals("")) {
            if (i > 0) {
                sb.append(", ");
            }
            sb.append("\"");
            sb.append(substrings[i]);
            sb.append("\"");
            needComma = true;
        }
        if (i < substrings.length - 1) {
            if (needComma) {
                sb.append(", ");
            }
            sb.append("'\"'");
        }
        System.out.println("Step " + i + ": " + sb.toString());
    }
    //This stuff is because Java is being stupid about splitting strings
    if(value.endsWith("\"")) {
        sb.append(", '\"'");
    }
    //The code works if the string ends in a apos
    /*else if(value.endsWith("'")) {
        sb.append(", \"'\"");
    }*/
    sb.append(")");
    String s = sb.toString();
    System.out.println(s);
    return s;
}

希望这有助于某人!

答案 3 :(得分:5)

到目前为止,解决此问题的最佳方法是使用XPath库提供的工具来声明可以在表达式中引用的XPath级变量。然后,变量值可以是宿主编程语言中的任何字符串,并且不受XPath字符串文字的限制。例如,在Java中使用javax.xml.xpath

XPathFactory xpf = XPathFactory.newInstance();
final Map<String, Object> variables = new HashMap<>();
xpf.setXPathVariableResolver(new XPathVariableResolver() {
  public Object resolveVariable(QName name) {
    return variables.get(name.getLocalPart());
  }
});

XPath xpath = xpf.newXPath();
XPathExpression expr = xpath.compile("ListObject[@Title=$val]");
variables.put("val", someValue);
NodeList nodes = (NodeList)expr.evaluate(someNode, XPathConstants.NODESET);

对于C#XPathNavigator,您可以定义自定义XsltContext as described in this MSDN article(您只需要此示例中与变量相关的部分,而不是扩展函数)。

答案 4 :(得分:3)

这里的大部分答案都集中在如何使用字符串操作来拼凑使用字符串分隔符的XPath。

我认为最好的做法是不要依赖这种复杂且可能很脆弱的方法。

以下内容适用于.NET,因为此问题标有C#。 Ian Roberts提供了我认为在Java中使用XPath时最好的解决方案。

如今,您可以使用Linq-to-Xml以允许您直接在查询中使用变量的方式查询XML文档。这不是XPath,但目的是一样的。

对于OP中给出的示例,您可以像这样查询所需的节点:

var value = "Some value with 'apostrophes' and \"quotes\"";

// doc is an instance of XElement or XDocument
IEnumerable<XElement> nodes = 
                      doc.Descendants("ListObject")
                         .Where(lo => (string)lo.Attribute("Title") == value);

或使用查询理解语法:

IEnumerable<XElement> nodes = from lo in doc.Descendants("ListObject")
                              where (string)lo.Attribute("Title") == value
                              select lo;

.NET还提供了一种在XPath查询中使用XPath变量的方法。遗憾的是,开箱即用并不容易,但是我在this other SO answer中提供了一个简单的帮助类,这很容易。

你可以像这样使用它:

var value = "Some value with 'apostrophes' and \"quotes\"";

var variableContext = new VariableContext { { "matchValue", value } };
// ixn is an instance of IXPathNavigable
XPathNodeIterator nodes = ixn.CreateNavigator()
                             .SelectNodes("ListObject[@Title = $matchValue]", 
                                          variableContext);

答案 5 :(得分:2)

这是Robert Rossney的StringBuilder方法的替代方案,可能更直观:

    /// <summary>
    /// Produce an XPath literal equal to the value if possible; if not, produce
    /// an XPath expression that will match the value.
    /// 
    /// Note that this function will produce very long XPath expressions if a value
    /// contains a long run of double quotes.
    /// 
    /// From: http://stackoverflow.com/questions/1341847/special-character-in-xpath-query
    /// </summary>
    /// <param name="value">The value to match.</param>
    /// <returns>If the value contains only single or double quotes, an XPath
    /// literal equal to the value.  If it contains both, an XPath expression,
    /// using concat(), that evaluates to the value.</returns>
    public static string XPathLiteral(string value)
    {
        // If the value contains only single or double quotes, construct
        // an XPath literal
        if (!value.Contains("\""))
            return "\"" + value + "\"";

        if (!value.Contains("'"))
            return "'" + value + "'";

        // If the value contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //
        //    concat("foo",'"',"bar")

        List<string> parts = new List<string>();

        // First, put a '"' after each component in the string.
        foreach (var str in value.Split('"'))
        {
            if (!string.IsNullOrEmpty(str))
                parts.Add('"' + str + '"'); // (edited -- thanks Daniel :-)

            parts.Add("'\"'");
        }

        // Then remove the extra '"' after the last component.
        parts.RemoveAt(parts.Count - 1);

        // Finally, put it together into a concat() function call.
        return "concat(" + string.Join(",", parts) + ")";
    }

答案 6 :(得分:2)

您可以使用搜索和替换来引用XPath字符串。

在F#中

let quoteString (s : string) =
    if      not (s.Contains "'" ) then sprintf "'%s'"   s
    else if not (s.Contains "\"") then sprintf "\"%s\"" s
    else "concat('" + s.Replace ("'", "', \"'\", '") + "')"

我没有对它进行过广泛的测试,但似乎有效。

答案 7 :(得分:0)

如果你在SomeValue中不会有任何双引号,你可以使用转义双引号来指定你在XPath搜索字符串中搜索的值。

ListObject[@Title=\"SomeValue\"]

答案 8 :(得分:0)

您可以在double quotes表达式中使用single quotes代替XPath来解决此问题。

例如:

element.XPathSelectElements(String.Format("//group[@title=\"{0}\"]", "Man's"));

答案 9 :(得分:-1)

我有一段时间没遇到这个问题,看似最简单但不是最快的解决方案是你在XML文档中添加一个属性值为'SomeValue'的新节点,然后使用一个属性查找该属性值简单的xpath搜索。完成操作后,可以从XML文档中删除“临时节点”。

这样,整个比较发生在“内部”,因此您不必构造奇怪的XPath查询。

我似乎记得为了加快速度,你应该将temp值添加到根节点。

祝你好运......