C#-根据内容的变化从字符串中提取子字符串

时间:2019-03-04 10:14:53

标签: c#

(WHERE)
  (CONDITION OPERATOR="AND")  
   (EXPRESSION NAME="abc" ATTRIBUTE="minor")
   (VALUE)m1(/VALUE)
   (/EXPRESSION)

  (EXPRESSION NAME="abc" ATTRIBUTE="ID")
  (VALUE)ID(/VALUE)
  (/EXPRESSION)

  (EXPRESSION NAME="abc" ATTRIBUTE="major")
  (VALUE)m2(/VALUE)
  (/EXPRESSION)

(/CONDITION)     
(/WHERE)

如何从字符串中获取3个子字符串,例如minor =第一个substringattribute = "minor",然后字符串Id=以及下一个具有属性Id的子字符串,依此类推,因为表达式名称可能会更改,并且我无法整体使用字符串来获取ID(VALUE)ID(/VALUE)的值。希望我的问题清楚。

1 个答案:

答案 0 :(得分:0)

您的输入具有常规结构,因此可以将其转换为xml:

<WHERE>
  <CONDITION OPERATOR="AND">
    <EXPRESSION NAME="abc" ATTRIBUTE="minor">
      <VALUE>m1</VALUE>
    </EXPRESSION>
    <EXPRESSION NAME="abc" ATTRIBUTE="ID">
      <VALUE>ID</VALUE>
    </EXPRESSION>
    <EXPRESSION NAME="abc" ATTRIBUTE="major">
      <VALUE>m2</VALUE>
    </EXPRESSION>
  </CONDITION>
</WHERE>

然后使用xpath像//EXPRESSION[@ATTRIBUTE='major']/*[1]

进行查询

虽然简单的string.Replace可能有效,但我认为最好只替换不在属性值内的括号。您可以使用正则表达式查找字符串:

"([^"\\]|\\.)*"

并提取字符串边界:

var stringsBounds = Regex.Matches(input, "\"([^\"\\\\]|\\\\.)*\"")
    .Cast<Match>()
    .Select(m => new
    {
        begin = m.Index,
        end = m.Index + m.Length - 1
    })
    .ToArray();

在此范围内,您可以进行智能替换:

Func<Match, bool> isInsideString = m => stringsBounds.Any(b => m.Index > b.begin && m.Index < b.end);
var xmlAsText = Regex.Replace(Regex.Replace(input, "\\(", m => isInsideString(m) ? "(" : "<"),
    "\\)", m => isInsideString(m) ? ")" : ">");

现在您可以查询xml了:

var xml = XDocument.Parse(xmlAsText);

var expressionSelector = "//EXPRESSION[@ATTRIBUTE='{0}']/*[1]";

foreach (var attribute in new [] {"minor", "major", "ID"})
{
    var xpath = string.Format(expressionSelector, attribute);
    var node = xml.XPathSelectElement(xpath);

    Console.WriteLine($"Attribute: {attribute}, element: {node}");
}

您可以尝试online