ML.Net预测分数始终返回NaN(空)。
这个想法是要教一个回归算法,以学习我的家人的日常工作。 我尝试了ML.Net nuget程序包和代码示例的几个变体,但结果相同:Score == NaN。 下面是一些代码和一部分数据集,这些数据是从我的家庭自动化系统记录的。
这是MSDN中电影推荐回归示例的一种变体:
public class AutomationData
{
[LoadColumn(0)]
//0 - 6
public int Day;
[LoadColumn(1)]
//example: 0947 == 9:47am
public int TimeOfDay;
//Device Id
[LoadColumn(2)]
public int Device;
//This is the State of the device (0 OFF - 1 ON)
// Seems it has to be float? (Vector R4)
[LoadColumn(3)]
public float Label;
}
public class AutomationPrediction
{
public float Label;
public float Score;
}
public static void Regression()
{
MLContext mlContext = new MLContext();
IDataView trainingDataView = LoadData(mlContext).training;
IDataView testDataView = LoadData(mlContext).test;
ITransformer model = BuildAndTrainModel(mlContext, trainingDataView);
EvaluateModel(mlContext, testDataView, model);
UseModelForSinglePrediction(mlContext, model);
}
public static (IDataView training, IDataView test) LoadData(MLContext mlContext)
{
var trainingDataPath = Path.Combine(Environment.CurrentDirectory, "MachineLearning/Data", "data.csv");
var testDataPath = Path.Combine(Environment.CurrentDirectory, "MachineLearning/Data", "data.csv");
IDataView trainingDataView = mlContext.Data.LoadFromTextFile<AutomationData>(trainingDataPath, hasHeader: true, separatorChar: ',');
IDataView testDataView = mlContext.Data.LoadFromTextFile<AutomationData>(testDataPath, hasHeader: true, separatorChar: ',');
return (trainingDataView, testDataView);
}
public static ITransformer BuildAndTrainModel(MLContext mlContext, IDataView trainingDataView)
{
IEstimator<ITransformer> estimator = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "deviceEncoded", inputColumnName: "Device")
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "timeOfDayEncoded", inputColumnName: "TimeOfDay"))
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "dayEncoded", inputColumnName: "Day"));
var options = new MatrixFactorizationTrainer.Options
{
MatrixColumnIndexColumnName = "deviceEncoded",
MatrixRowIndexColumnName = "timeOfDayEncoded",
LabelColumnName = "Label",
NumberOfIterations = 20,
ApproximationRank = 100
};
var trainerEstimator = estimator.Append(mlContext.Recommendation().Trainers.MatrixFactorization(options));
ITransformer model = trainerEstimator.Fit(trainingDataView);
return model;
}
public static void EvaluateModel(MLContext mlContext, IDataView testDataView, ITransformer model)
{
var prediction = model.Transform(testDataView);
var metrics = mlContext.Regression.Evaluate(prediction, label: DefaultColumnNames.Label, score: DefaultColumnNames.Score);
Console.WriteLine("Rms: " + metrics.Rms.ToString());
Console.WriteLine("RSquared: " + metrics.RSquared.ToString());
}
public static void UseModelForSinglePrediction(MLContext mlContext, ITransformer model)
{
var predictionEngine = model.CreatePredictionEngine<AutomationData, AutomationPrediction>(mlContext);
var testInput = new AutomationData { Device = 117, TimeOfDay = 0945 };
var automationPrediction = predictionEngine.Predict(testInput);
Console.WriteLine("Prediction Score: " + Math.Round(automationPrediction.Score, 1)); //Is Always 'NaN' (null)
if (Math.Round(automationPrediction.Score, 1) > 3.5)
{
Console.WriteLine("State: " + testInput.Label);
}
else
{
Console.WriteLine("State " + testInput.Label);
}
}
}
这是回归算法尝试使用的data.csv的片段。
Day,TimeOfDay,Device,State
6,0827,999,1
6,0827,117,1
6,0827,117,0
6,0838,18,1
6,0838,79,1
6,0838,6,1
6,0901,117,1
6,0908,999,0
6,0910,73,0
6,0913,72,1
6,0914,72,0
6,0915,79,0
6,0915,6,0
6,0915,5,0
6,0915,4,0
6,0915,18,0
6,1015,18,1
6,1015,79,1
6,1015,6,1
6,1015,5,1
6,1015,4,1
6,1726,18,1
6,1726,79,1
6,1726,51,0
6,1726,128,0
6,1726,69,0
我希望预测状态返回的值为0或1(打开或关闭),以及得分(浮点数),这将显示回归认为正确的接近程度。
答案 0 :(得分:1)
它返回 Nan ,因为没有足够的数据来进行预测。 我的意思是,矩阵分解将以相似值的近似值进行预测。
在您的示例中,您仅使用矩阵分解中的TimeOfDay和Device列, 因此,对于您要使用的单个预测(新的AutomationData {Device = 117,TimeOfDay = 0945}),该模型将返回 Nan 作为得分,因为它无法真正从学习的模型中预测值
进行测试,预测像这样的已知值
new AutomationData { Device = 73, TimeOfDay = 0910 };
您将获得实际分数。
此外,您不应该使用与测试相同的训练数据,这会使模型评估变得不必要。
毕竟,对于您的用例而言,矩阵分解可能不是理想的选择。