Question

好的，所以我有一个正则表达式，我需要它来查找多行字符串中的匹配项。这是我正在使用的字符串：

Device Identifier:        disk0
Device Node:              /dev/disk0
Part of Whole:            disk0
Device / Media Name:      OCZ-VERTEX2 Media 

Volume Name:              Not applicable (no file system)

Mounted:                  Not applicable (no file system)

File System:              None

Content (IOContent):      GUID_partition_scheme
OS Can Be Installed:      No
Media Type:               Generic
Protocol:                 SATA
SMART Status:             Verified

Total Size:               240.1 GB (240057409536 Bytes) (exactly 468862128 512-Byte-Blocks)
Volume Free Space:        Not applicable (no file system)
Device Block Size:        512 Bytes

Read-Only Media:          No
Read-Only Volume:         Not applicable (no file system)
Ejectable:                No

Whole:                    Yes
Internal:                 Yes
Solid State:              Yes
OS 9 Drivers:             No
Low Level Format:         Not supported

基本上我需要将每一行分成两组，冒号作为分隔符。我正在使用的正则表达式是：

@"([A-Za-z0-9\(\) \-\/]+):([A-Za-z0-9\(\) \-\/]+).*"

它确实有效，但只取出第一行并将其分成两组，就像我想要的那样，但它会在那时停止。我尝试过Multiline选项，但它没有任何区别。

我必须承认我是正则表达世界的新手。

感谢任何帮助。

Answer 1

以下示例似乎有效，并且还使用命名组来更容易理解正则表达式。

    var rgx = new System.Text.RegularExpressions.Regex(@"(?<Key>[^:\r\n]+):([\s]*)(?<Value>[^\r\n]*)");
    foreach (var match in rgx.Matches(str).Cast<Match>())
    {
        Console.WriteLine("{0}: {1}", match.Groups["Key"].Value, match.Groups["Value"].Value);
    }

为了好玩，这会将整个内容转换为易于使用的字典：

var dictionary = rgx.Matches(str).Cast<Match>().ToDictionary(match => match.Groups["Key"].Value, match => match.Groups["Value"].Value);

Answer 2

你的正则表达式的问题是最后一个。*。它与\ r \ n匹配，因此匹配整个其余字符串。

Answer 3

我建议改用String.Split。假设您的所有密钥都是唯一的：

string[] lines = str.Split(new char[] { '\r', '\n'} , 
    StringSplitOptions.RemoveEmptyEntries);

Dictionary<string, string> dict = lines.ToDictionary(
    line => line.Split(':').First(), 
    line => line.Split(new char[] { ':' }, 2).Last().Trim());

Answer 4

如果您正在使用正则表达式选项SingleLine，则.*匹配整个剩余字符串，因此只有一个匹配。

SingleLine告诉正则表达式解析器在\n上进行匹配时还要接受换行符（即.）

你甚至需要。*吗？

替代方案你可以使用

^([A-Za-z0-9\(\) \-\/]+):([A-Za-z0-9\(\) \-\/]+)$

只要你使用正则表达式选项MultiLine来使^ $匹配行的开头和结尾而不是字符串。

正则表达式在第一行匹配后停止

4 个答案: