CSV文件中的多列分组

时间:2016-11-29 21:28:40

标签: c# .net linq csv parsing

我有一个CSV文件,有点像这样:

Header1a; Header1b; Header2a;  Header2b;  Header3a...
Value1a;  Value1b;  Value2a;   Value2b;   Value3a...
Value1a;  Value2b;  Value2a;   Value2b;   Value3a...
Value1a;  Value2b;  Value2a;   Value2b;   Value3a...
Value1a;  Value2b;  Value2a;   Value2b;   Value3a...

文件的第一行包含标题,其中每对2列属于一个数据集(Header1Header2Header3)。实际值也是如此:Value1aValue1b是属于Header1的值的元组,依此类推......

所以:

Set 1 (Header 1)  | Set 2 (Header 2)  | Set 3 (Header 3)  |
-----------------------------------------------------------
Value1a, Value1b  | Value2a, Value2b  | Value3a, Value3b  | <-- tuples
Value1a, Value1b  | Value2a, Value2b  | Value3a, Value3b  |
Value1a, Value1b  | Value2a, Value2b  | Value3a, Value3b  |
Value1a, Value1b  | Value2a, Value2b  | Value3a, Value3b  |

我想要实现的是为每个数据集创建一个类型,该类型具有标题和表示集合值的元组列表。

class DataSet {
   string Name;
   List<Tuple<string, string>>()
}

到目前为止,我的方法是获取CSV文件的第一行,使用分隔符(;)拆分它并从数组中的每个第2项获取文本,以获取数据集的名称以及文件中的数据集量。

var headers = firstLine.Split(new[] { separator }
              .Where((header, index) => index % 2 == 0))
              -> cleanup (Header1a => Header1) etc..

然后使用分组处理剩余的行:

// total amount of columns per row
var columnCount = headers.Count * 2;
var values  = rows
  // split the rows using the separator (;)
  .Select(row => row.Split(new[] { separator })
  // take only those rows which fit the column count (=> headers)
  .Where(columns => columns.Length == columnCount)
  // select the columns by index
  .Select((columns, index) => new { columns, index })

  // now here I want to group the columns of each row into groups of 2 columns
  // but that doesn't actually work, it groups the total amount of rows
  // by groups of 2 rows each
  .GroupBy(group => group.index / 2, group => group.columns)
  .Select(group => group.ToArray());

我怎样才能做到这一点?我需要一些方法来告诉LINQ它应该将列分为EACH行而不是所有行,但是我不能使用SelectMany()因为否则我会丢失各行(I&#39; ll获取元组的单个枚举,而不是枚举元组的枚举。)

1 个答案:

答案 0 :(得分:1)

尝试了一个可能有帮助的代码示例。

首先创建一些示例数据,我们可以将其用作源:

List<String> data;
{
    var rows = Enumerable.Range(1, 10);
    var sets = Enumerable.Range(1, 6);
    var itemsPerSet = Enumerable.Range(1, 2);

    data = rows.Select(rowIndex =>
        String.Join(Environment.NewLine,
            String.Join(",", sets.Select(setIndex =>
                String.Join(",", itemsPerSet.Select(itemIndex =>
                    $"Value{rowIndex}-{setIndex}-{itemIndex}")))))).ToList();

    foreach (var row in data)
    {
        Console.WriteLine(row);
    }

    Console.WriteLine(new String('-', 20));
}

然后从中获取所需的数据:

var selectedColumns = new[] { 0, 1, 4, 5 };

var foo = data.Select(row => row.Split(new[] { "," }, StringSplitOptions.None)
                                .Where((value, columnIndex) => selectedColumns.Contains(columnIndex)))
              .Select(row => row.Select((Value, ColumnIndex) => new { Value, ColumnIndex })
                                .GroupBy(pair => pair.ColumnIndex / 2)
                                .Select(group => $"Group{group.Key}({String.Join(";", group.Select(pair => pair.Value))})"));

foreach (var row in foo)
{
    foreach (var item in row)
    {
        Console.WriteLine(item);
    }
}