我有一个CSV文件,有点像这样:
Header1a; Header1b; Header2a; Header2b; Header3a...
Value1a; Value1b; Value2a; Value2b; Value3a...
Value1a; Value2b; Value2a; Value2b; Value3a...
Value1a; Value2b; Value2a; Value2b; Value3a...
Value1a; Value2b; Value2a; Value2b; Value3a...
文件的第一行包含标题,其中每对2列属于一个数据集(Header1
,Header2
,Header3
)。实际值也是如此:Value1a
和Value1b
是属于Header1
的值的元组,依此类推......
所以:
Set 1 (Header 1) | Set 2 (Header 2) | Set 3 (Header 3) |
-----------------------------------------------------------
Value1a, Value1b | Value2a, Value2b | Value3a, Value3b | <-- tuples
Value1a, Value1b | Value2a, Value2b | Value3a, Value3b |
Value1a, Value1b | Value2a, Value2b | Value3a, Value3b |
Value1a, Value1b | Value2a, Value2b | Value3a, Value3b |
我想要实现的是为每个数据集创建一个类型,该类型具有标题和表示集合值的元组列表。
class DataSet {
string Name;
List<Tuple<string, string>>()
}
到目前为止,我的方法是获取CSV文件的第一行,使用分隔符(;)拆分它并从数组中的每个第2项获取文本,以获取数据集的名称以及文件中的数据集量。
var headers = firstLine.Split(new[] { separator }
.Where((header, index) => index % 2 == 0))
-> cleanup (Header1a => Header1) etc..
然后使用分组处理剩余的行:
// total amount of columns per row
var columnCount = headers.Count * 2;
var values = rows
// split the rows using the separator (;)
.Select(row => row.Split(new[] { separator })
// take only those rows which fit the column count (=> headers)
.Where(columns => columns.Length == columnCount)
// select the columns by index
.Select((columns, index) => new { columns, index })
// now here I want to group the columns of each row into groups of 2 columns
// but that doesn't actually work, it groups the total amount of rows
// by groups of 2 rows each
.GroupBy(group => group.index / 2, group => group.columns)
.Select(group => group.ToArray());
我怎样才能做到这一点?我需要一些方法来告诉LINQ它应该将列分为EACH行而不是所有行,但是我不能使用SelectMany()
因为否则我会丢失各行(I&#39; ll获取元组的单个枚举,而不是枚举元组的枚举。)
答案 0 :(得分:1)
尝试了一个可能有帮助的代码示例。
首先创建一些示例数据,我们可以将其用作源:
List<String> data;
{
var rows = Enumerable.Range(1, 10);
var sets = Enumerable.Range(1, 6);
var itemsPerSet = Enumerable.Range(1, 2);
data = rows.Select(rowIndex =>
String.Join(Environment.NewLine,
String.Join(",", sets.Select(setIndex =>
String.Join(",", itemsPerSet.Select(itemIndex =>
$"Value{rowIndex}-{setIndex}-{itemIndex}")))))).ToList();
foreach (var row in data)
{
Console.WriteLine(row);
}
Console.WriteLine(new String('-', 20));
}
然后从中获取所需的数据:
var selectedColumns = new[] { 0, 1, 4, 5 };
var foo = data.Select(row => row.Split(new[] { "," }, StringSplitOptions.None)
.Where((value, columnIndex) => selectedColumns.Contains(columnIndex)))
.Select(row => row.Select((Value, ColumnIndex) => new { Value, ColumnIndex })
.GroupBy(pair => pair.ColumnIndex / 2)
.Select(group => $"Group{group.Key}({String.Join(";", group.Select(pair => pair.Value))})"));
foreach (var row in foo)
{
foreach (var item in row)
{
Console.WriteLine(item);
}
}