Question

我喜欢正则表达式！

我有一个字符串，它将是一个错误的XML形式，如：

<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>

一切都将在一条线上，但“标题”通常会有所不同。

所以我需要做的是从上面的字符串中提取所有信息，将其放入Dictionary / Hashtable

-

string myString = @"<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>";

//this will extract the name of the label in the header
Regex r = new Regex(@"(?<header><[A-Za-z]+>?)");

//Create a collection of matches
MatchCollection mc = r.Matches(myString);

foreach (Match m in mc)
{
    headers.Add(m.Groups["header"].Value);
}


//this will try and get the values.
r = new Regex(@"(?'val'>[A-Za-z0-9\s]*</?)");

mc = r.Matches(myString);

foreach (Match m in mc)
{
    string match = m.Groups["val"].Value;
    if (string.IsNullOrEmpty(match) || match == "><" || match == "> <")
        continue;
    else
        values.Add(match);
}

- 我从以前使用正则表达式的工作中一起攻击到最接近的地方。但它并不像我想要的那样真正起作用。

'header'也会拉出尖括号。

“价值”吸引了大量的空白（因此循环中的狡猾的if语句）。它也不适用于带句点，逗号，空格等的字符串。

如果我能将这两个语句结合起来也好得多，所以我不必两次遍历正则表达式。

任何人都可以提供一些我可以改进的信息吗？

Answer 1

如果它看起来像XML，为什么不使用.net的XML解析器功能？您需要做的就是在它周围添加一个根元素：

string myString = @"<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>";

var values = new Dictionary<string, string>();
var xml = XDocument.Parse("<root>" + myString + "</root>");
foreach(var e in xml.Root.Elements()) {
    values.Add(e.Name.ToString(), e.Value);
}

Answer 2

这应该剥去尖括号：

 Regex r = new Regex(@"<(?<header>[A-Za-z]+)>");

这应该摆脱空格：

r = new Regex(@">\s*(?'val'[A-Za-z0-9\s]*)\s*</");

Answer 3

这将匹配没有＆lt;＆gt;：

的标头

(?<=<)(?<header>[A-Za-z]+)(?=>)

这将获得所有值（我不确定什么可以被接受为值）：

(?<=>)(?'val'[^<]*)(?=</)

但是这都是xml所以你可以：

XDocument doc = XDocument.Parse(string.Format("<root>{0}</root>",myString));
var pairs = doc.Root.Descendants().Select(node => new KeyValuePair<string, string>(node.Name.LocalName, node.Value));

正则表达式问题，将数据提取到组

3 个答案: