如何获取在我搜索的术语后出现的逗号分隔值?

时间:2014-05-20 15:24:25

标签: c# regex parsing csv

到目前为止,这是我的代码:

public void DeserialStream(string filePath)
    {
        using (StreamReader sr = new StreamReader(filePath))
        {
            string currentline;
            while ((currentline = sr.ReadLine()) != null)
            {
                if (currentline.IndexOf("Count", StringComparison.CurrentCultureIgnoreCase) >= 0)
                {
                    Console.WriteLine(currentline);
                }

            }
        }
    }

我想知道如何获取在我搜索的字词后出现的逗号分隔值?

就像我是包含此信息的csv一样:

"Date","dd/mm/yyyy"
"ExpirationDate","dd/mm/yyyy"

"DataType","Count"
"Location","Unknown","Variable1","Variable2","Variable3"
"A(Loc3, Loc4)","Unknown","5656","787","42"
"A(Loc5, Loc6)","Unknown","25","878","921"

"DataType","Net"
"Location","Unknown","Variable1","Variable2","Variable3"
"A(Loc3, Loc4)","Unknown","5656","787","42"
"A(Loc5, Loc6)","Unknown","25","878","921"

但是如何在Count之后但在Net之前获取值表?

也就是说,只有数据是括号才是我要解析的内容:

"Date","dd/mm/yyyy"
    "ExpirationDate","dd/mm/yyyy"

    "DataType","Count"
   [ "Location","Unknown","Variable1","Variable2","Variable3"
    "A(Loc3, Loc4)","Unknown","5656","787","42"
    "A(Loc5, Loc6)","Unknown","25","878","921"]

    "DataType","Net"
    "Location","Unknown","Variable1","Variable2","Variable3"
    "A(Loc3, Loc4)","Unknown","5656","787","42"
    "A(Loc5, Loc6)","Unknown","25","878","921"

我在想也许我应该使用正则表达式,还是使用上述方法更简单?

2 个答案:

答案 0 :(得分:2)

你可以使用这样的正则表达式:

\"DataType\"\,\"(?:Count|Net)\"((?!\"DataType\").)*

这会将DataType行一直匹配到下一个DataType行。

答案 1 :(得分:1)

您可以使用LINQ:

List<string> lines = File.ReadLines(path)
   .SkipWhile(l => l.IndexOf("\"Count\"", StringComparison.InvariantCultureIgnoreCase) == -1)
   .Skip(1) // skip the "Count"-line
   .TakeWhile(l => l.IndexOf("\"Net\"",   StringComparison.InvariantCultureIgnoreCase) == -1)
   .ToList();

使用String.Split为每一行获取string[]。一般来说,我会使用available CSV parser来处理边缘情况和坏数据,而不是重新发明轮子。

修改:如果您要将字段拆分为List<string>,则应使用上面提到的CSV解析器,因为您的数据已使用引号字符,所以逗号包含在{{1不应该拆分。

但是,这是使用"

的另一种简单而有效的方法
StringBuilder

(感谢https://stackoverflow.com/a/4150727/284240

现在,您可以在上面的查询中使用public static IEnumerable<string> SplitCSV(string csvString) { var sb = new StringBuilder(); bool quoted = false; foreach (char c in csvString) { if (quoted) { if (c == '"') quoted = false; else sb.Append(c); } else { if (c == '"') { quoted = true; } else if (c == ',') { yield return sb.ToString(); sb.Length = 0; } else { sb.Append(c); } } } if (quoted) throw new ArgumentException("csvString", "Unterminated quotation mark."); yield return sb.ToString(); } 来展平所有令牌:

SelectMany

结果:

List<string> allTokens = File.ReadLines(path)
    .SkipWhile(l => l.IndexOf("\"Count\"", StringComparison.InvariantCultureIgnoreCase) == -1)
    .Skip(1) // skip the "Count"-line
    .TakeWhile(l => l.IndexOf("\"Net\"", StringComparison.InvariantCultureIgnoreCase) == -1)
    .SelectMany(l => SplitCSV(l.Trim()))
    .ToList();