我有以下数据集:
Year Category Score
2011 A 83
2012 A 86
2013 A 62
2011 B 89
2012 B 86
2013 B 67
2011 C 85
2012 C 73
2013 C 79
2011 D 95
2012 D 78
2013 D 67
我想转换为以下结构。
categories: [2011, 2012, 2013],
series: [
{ data: [83, 86, 62], name: 'A' },
{ data: [85, 73, 79], name: 'B' },
{ data: [83, 86, 62], name: 'C' },
{ data: [95, 78, 67], name: 'D' }]
我希望代码能够容忍'缺少'源数据集中的数据。这是一个安全的假设,即每年至少有1个类别在源数据中表示。
粗略的例子'数据
Year Category Score
2011 A 83
// 2012 A is missing
2013 A 62
// 2011 B is missing
2012 B 86
2013 B 67
2011 C 85
// 2012 C is missing
2013 C 79
2011 D 95
2012 D 78
2013 D 67
应该产生这个:
categories: [2011, 2012, 2013],
series: [
{ data: [83, 0, 62], name: 'A' },
{ data: [ 0, 73, 79], name: 'B' },
{ data: [83, 0, 62], name: 'C' },
{ data: [95, 78, 67], name: 'D' }]
答案 0 :(得分:0)
从pastebin代码创建以下LINQPad代码 - 请参阅实现后的注释:
void Main()
{
var scores = new [] {
new CScore { Year = 2011, Category = 'A', Score = 83 },
// 2012 A is missing
new CScore { Year = 2013, Category = 'A', Score = 62 },
// 2011 B is missing
new CScore { Year = 2012, Category = 'B', Score = 86 },
new CScore { Year = 2013, Category = 'B', Score = 67 },
new CScore { Year = 2011, Category = 'C', Score = 85 },
// 2012 C is missing
new CScore { Year = 2013, Category = 'C', Score = 79 },
new CScore { Year = 2011, Category = 'D', Score = 95 },
new CScore { Year = 2012, Category = 'D', Score = 78 },
new CScore { Year = 2013, Category = 'D', Score = 67 },
};
int[] years = scores.Select(i => i.Year).Distinct()
.OrderBy(i=>i).ToArray();
char[] categories = scores.Select(i => i.Category).Distinct()
.OrderBy(i=>i).ToArray();
var series =
from year in years
from cat in categories
join score in scores
on new { Year = year, Category = cat }
equals new { score.Year, score.Category } into scoreGroup
select scoreGroup.SingleOrDefault() ??
new CScore { Year = year, Category = cat } into scoreWithDefault
group scoreWithDefault.Score by scoreWithDefault.Category into g
select new Series { Name = g.Key.ToString(), Data = g.ToArray() };
years.Dump(); // categories
series.Dump(); // series
}
class CScore
{
public char Category {get;set;}
public int Year {get;set;}
public int Score {get;set;}
}
class Series
{
public string Name {get;set;}
public int[] Data {get;set;}
}
CScore
- 重命名以避免我遇到的命名错误join..into
允许缺少年份的默认CScore
代SingleOrDefault
以便 IF 输入数据在联接上有多个匹配的CScore项,查询将抛出{{1} }表示应该采取更多措施来处理冗余问题。我发现这比InvalidOperationException
更可取,而FirstOrDefault
在这种不良数据/奇怪数据的情况下不会失败。Score = 0
初始化程序段中忽略CScore
,因为0是默认值。select..into
查询延续允许将查询提供给按类别/名称对分数进行分组的group..by
。我非常感谢null coalesce operator。group..by..into g
- Series
类型类似于{I},如果我已经停止使用分组,我会使用IGrouping<char,int>
类型。相反,IGrouping键入所需Series
类型的最终选择项目。我在LINQPad输出中验证了答案 - 并且在'应该产生'样本输出数据中发现了一些缺陷。此外,这段代码在我的机器上执行大约一毫秒,所以除非我们有更多的数据要处理,否则我不会想要改进它。
尽管我们可以谈论更多 - 我会把它留在那里。希望我没有失去任何人。