Question

我有一个示例字符串：

<num>1.</num> <Ref>véase anomalía de Ebstein</Ref> <num>2.</num> <Ref>-> vascularización</Ref>

我希望使用ref tags中的值创建一个逗号分隔的字符串。

我尝试了以下内容：

            Regex r = new Regex("<ref>(?<match>.*?)</ref>");
            Match m = r.Match(csv[4].ToLower());
            if (m.Groups.Count > 0)
            {
                if (m.Groups["match"].Captures.Count > 0)
                {
                    foreach (Capture c in m.Groups["match"].Captures)
                    {
                        child.InnerText += c.Value + ", ";       
                    }
                    child.InnerText = child.InnerText.Substring(0, child.InnerText.Length - 2).Replace("-> ", "");
                }
            }

但这似乎只能找到第一个ref标签内的值。

我哪里错了？

Answer 1

您希望使用匹配而非匹配来获取所有匹配项，例如：

Regex r = new Regex("<ref>(?<match>.*?)</ref>");
foreach (Match m in r.Matches(csv[4]))
{
    if (m.Groups.Count > 0)
    {
        if (m.Groups["match"].Captures.Count > 0)
        {
            foreach (Capture c in m.Groups["match"].Captures)
            {
                child.InnerText += c.Value + ", ";
            }
            child.InnerText = child.InnerText.Substring(0, child.InnerText.Length - 2).Replace("-> ", "");
        }
    }
}

Answer 2

我强烈建议在正则表达式上使用XPath来搜索XML文档。

string xml = @"<test>
    <num>1.</num> <Ref>véase anomalía de Ebstein</Ref> <num>2.</num> <Ref>-> vascularización</Ref>
</test>";

XmlDocument d = new XmlDocument();
d.LoadXml(xml);

var list = from XmlNode n in d.SelectNodes("//Ref") select n.InnerText;
Console.WriteLine(String.Join(", ", list.ToArray()));

Answer 3

正则表达式通常很饿，因此它会匹配从第一个标签到最后一个标签。如果XML格式正确，您可以将regex更改为：

Regex r = new Regex("<ref>(?<match>[^<]*?)</ref>");

搜索除＆lt;

以外的任何内容

使用正则表达式在某些标记内查找值

3 个答案: