C#使用正则表达式分割单词

时间:2015-10-31 23:48:44

标签: c# regex

这是我正在处理的代码的精简版本。代码的目的是获取一串信息,将其分解,并将其解析为键值对。

使用下面示例中的信息,字符串可能如下所示:

"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"

关于上述示例的另一点,我们必须解析的至少三个功能偶尔会包含其他值。这是一个更新的假示例字符串。

"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"

问题在于代码拒绝单独拆分DIVIDE和DIV信息。相反,它会在DIV处继续拆分,然后将剩余的信息指定为值。

有没有办法告诉我的代码DIVIDE和DIV需要解析为两个单独的值,而不是将DIVIDE转换成DIV?

public List<string> FeatureFilterStrings
    {
        // All possible feature types from the EWSD switch.  
        get
        {
            return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
        }
    }

public void Parse(string input){

    Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };


    Regex regex = new Regex(@"(?=\\bDIVIDE|DIV|CLACOS|INT)");
    string[] ms = regex.Split(updatedInput);
    List<string> queryLines = new List<string>();
    // takes the parsed out data and assigns it to the queryLines List<string>
    foreach (string m in ms)
    {
        queryLines.Add(m);
    }

    var features = queryLines.Where(queryFilter);
    foreach (string feature in features)
        {
            foreach (Match m in Regex.Matches(workLine, valueExpression))
            {
                string key = m.Groups["key"].Value.Trim();
                string value = String.Empty;

                value = Regex.Replace(m.Groups["value"].Value.Trim(), @"s", String.Empty);
                AddKeyValue(key, value);
            }
        }

    private void AddKeyValue(string key, string value)
    {
        try
        {
            // Check if key already exists. If it does, remove the key and add the new key with updated value.
            // Value information appends to what is already there so no data is lost.
            if (this.ContainsKey(key))
            {
                this.Remove(key);
                this.Add(key, value.Split('&'));
            }
            else
            {
                this.Add(key, value.Split('&'));
            }
        }
        catch (ArgumentException)
        {
            // Already added to the dictionary.
        }
    }       
}

进一步的信息,字符串信息在每个键/值之间没有设定数量的空格,每个字符串可能不包括所有值,并且这些特征并不总是以相同的顺序。欢迎解析旧的电话交换机信息。

2 个答案:

答案 0 :(得分:2)

我会根据你的输入字符串

创建一个字典
string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";

var dict = Regex.Matches(input, @"(\w+?) = (.+?)( |$)").Cast<Match>()
           .ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);

测试代码:

foreach(var kv in dict)
{
    Console.WriteLine(kv.Key + "=" + kv.Value);
}

答案 1 :(得分:1)

这可能是一个简单的替代方案。

试试这段代码:

var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";

var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);

var dictionary =
    parts.Select((x, n) => new { x, n })
         .GroupBy(xn => xn.n / 2, xn => xn.x)
         .Select(xs => xs.ToArray())
         .ToDictionary(xs => xs[0], xs => xs[1]);

然后我得到以下字典:

dictionary

根据您更新的输入,事情会变得更复杂,但这样做有效:

var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";

Func<string, char, string> tighten =
    (i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));

var parts =
    tighten(tighten(input, '&'), ',')
    .Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);

var dictionary =
    parts
        .Select((x, n) => new { x, n })
        .GroupBy(xn => xn.n / 2, xn => xn.x)
        .Select(xs => xs.ToArray())
        .ToDictionary(
            xs => xs[0],
            xs => xs
                .Skip(1)
                .SelectMany(x => x.Split(','))
                .SelectMany(x => x.Split('&'))
                .ToArray());

我收到这本词典:

dictionary2