正则表达式拆分文件.txt以填充数据表

时间:2018-04-20 11:37:00

标签: c# regex file datatable

我有一个关于拆分字符串并将其放入DataTable的问题。我有一个如下文本文件:

  

blabla | blabla2 | element1 | AB:CD | blabla3
  blabla | blabla2 | element2 | ABC | blabla3
  blabla | blabla2 | element3 | 123 | blabla3

所需的结果如下:

enter image description here

这是我的代码:

using (StreamReader file = new StreamReader(files))
{
    string line;
    while (((line = file.ReadLine()) != null))
    {
        char[] delimiter = {'|'};
        string pattern = @"\w*|'?[0-9a-zA-Z-:._]+'?";
        Regex r = new Regex(pattern, RegexOptions.Multiline);
        MatchCollection m = r.Matches(line);

        foreach (Match match in m)
        {
            DataRow row = dt.NewRow();
            string[] split = match.Groups[0].ToString().Split(delimiter);

            row[split[0]] = split[1]; // gives me an exception of index 
            limit array
            row[split[0]] = match.NextMatch().value;// gives me empty value
            dt.Rows.Add(row);
        }
    }
}

dgVResult.DataSource = dt;

但是我的代码的结果并不像预期的那样,结果如下:

enter image description here

2 个答案:

答案 0 :(得分:0)

string[] myenum = Enum.GetNames(typeof(allenum));
DataTable dt = new DataTable();
foreach (string enum in myenum)
{
    string[] lines = File.ReadAllLines(files);
    foreach (var line in lines)
    {
        string[] items = line.Split(new char[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
        DataRow row = dt.NewRow();
        if (!line.Contains(enum)) continue; // I have an enum with "element1" "element2" and "element3"
            row[items[2]] = items[3];
        dt.Rows.Add(row);
    }
}

最后,

dgVResult.DataSource = dt;

我放弃了正则表达式方法!但现在的结果是:

result

它有效,但我只想要一行。

答案 1 :(得分:0)

由于您有3个元素,因此使用自定义扩展方法将行分组为三个:

public static IEnumerable<IEnumerable<T>> GroupBy<T>(this IEnumerable<T> src, int chunkSize) {
    IEnumerable<T> NextChunk(IEnumerator<T> e, int chunkLeft) {
        do {
            yield return e.Current;
            --chunkLeft;
        } while (chunkLeft > 0 && e.MoveNext());
    }

    using (var srce = src.GetEnumerator()) {
        while (srce.MoveNext())
            yield return NextChunk(srce, chunkSize);
    }
}

为了避免在使用几个值时出现Split的开销,我编写了一个从字符串中提取的扩展方法:

public static string[] SplitExtract(this string src, string delim, int pos, int count = 1) {
    var ans = new List<string>();

    var startCharPos = 0;
    for (; pos > 0; --pos) {
        var tmpPos = src.IndexOf(delim, startCharPos + 1);
        if (tmpPos >= 0)
            startCharPos = tmpPos + delim.Length;
        else {
            startCharPos = src.Length;
            break;
        }
    }

    for (; count > 0 && startCharPos < src.Length; --count) {
        var nextDelimPos = src.IndexOf(delim, startCharPos);
        if (nextDelimPos < 0)
            nextDelimPos = src.Length;

        ans.Add(src.Substring(startCharPos, nextDelimPos - startCharPos));
        startCharPos = nextDelimPos + delim.Length;
    }
    return ans.ToArray();
}

现在您可以将它与LINQ一起使用来处理您的文件:

var Pairs = File.ReadLines(files)
                .Select(l => l.SplitExtract("|", 2, 2))
                .GroupBy(3)
                .Select(g => g.ToList());
var ColumnNames = Pairs.First().Select(p => p[0].Trim());
var dt = new DataTable();
foreach (var cn in ColumnNames)
    dt.Columns.Add(cn, typeof(string));

foreach (var r in Pairs)
    dt.Rows.Add(r.Select(p => p[1].Trim()).ToArray());