我有一个DataPoint列表,例如
List<DataPoint> newpoints=new List<DataPoint>();
其中DataPoint是一个类,包含从A到I的九个双重特征,以及
newpoints.count=100000 double points (i.e each point consists of nine double features from A to I)
我需要使用Min-Max规范化方法和0到1之间的scale_range对List newpoints应用规范化。
到目前为止,我已经实施了以下步骤
每个DataPoints功能都分配给一维数组。例如,功能A的代码
for (int i = 0; i < newpoints.Count; i++)
{ array_A[i] = newpoints[i].A;} and so on for all nine double features
我已经应用了max-min规范化方法。例如,功能A的代码:
normilized_featureA= (((array_A[i] - array_A.Min()) * (1 - 0)) /
(array_A.Max() - array_A.Min()))+0;
该方法非常成功,但需要更多时间(即3分45秒)
如何使用 C#中的LINQ代码来应用Max_min规范化,以将我的时间减少到几秒钟? 我在Stackoverflow How to normalize a list of int values中发现了这个问题,但我的问题是
double valueMax = list.Max(); // I need Max point for feature A for all 100000
double valueMin = list.Min(); //I need Min point for feature A for all 100000
等等所有其他9个功能 我们将非常感谢您的帮助。
答案 0 :(得分:1)
作为将9个特征建模为类&#34; DataPoint&#34;的双重属性的替代方法,您还可以将9个双精度数据点建模为数组,其好处是可以完成所有9个计算再次使用LINQ:
var newpoints = new List<double[]>
{
new []{1.23, 2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12},
new []{2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23},
new []{3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34},
new []{4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34, 15.32}
};
var featureStats = newpoints
// We make the assumption that all 9 data points are present on each row.
.First()
// 2 Anon Projections - first to determine min / max as a function of column
.Select((np, idx) => new
{
Idx = idx,
Max = newpoints.Max(x => x[idx]),
Min = newpoints.Min(x => x[idx])
})
// Second to add in the dynamic Range
.Select(x => new {
x.Idx,
x.Max,
x.Min,
Range = x.Max - x.Min
})
// Back to array for O(1) lookups.
.ToArray();
// Do the normalizaton for the columns, for each row.
var normalizedFeatures = newpoints
.Select(np => np.Select(
(i, idx) => (i - featureStats[idx].Min) / featureStats[idx].Range));
foreach(var datapoint in normalizedFeatures)
{
Console.WriteLine(string.Join(",", datapoint.Select(x => x.ToString("0.00"))));
}
结果:
0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
0.33,0.33,0.33,0.33,0.34,0.47,0.23,0.05,0.50
0.67,0.67,0.67,0.67,0.69,0.91,0.28,0.75,0.68
1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00
答案 1 :(得分:0)
不要一遍又一遍地重新计算最大/最小值,它不会改变。
double maxInFeatureA = array_A.Max();
double minInFeatureA = array_A.Min();
// somewher in the loop:
normilized_featureA= (((array_A[i] - minInFeatureA ) * (1 - 0)) /
(maxInFeatureA - minInFeatureA ))+0;
在foreach/for
中使用多个元素时,数组的最大/最小值非常昂贵。
我建议您使用此代码:Array data normalization
并将其用作
var normalizedPoints = newPoints.Select(x => x.A)
.NormalizeData(1, 1)
.ToList();
答案 2 :(得分:0)
double min = newpoints.Min(p => p.A);
double max = newpoints.Max(p => p.A);
double readonly normalizer = 1 / (max - min);
var normalizedFeatureA = newpoints.Select(p => (p.A - min) * normalizer);