我试图编写一个C#包装器方法,以使我更容易创建,训练和使用ML.NET分类模型,而不必对包含我的预测变量和目标变量的类进行硬编码。我看了所有示例,也找到了ML.NET文档,但找不到从读取数据到使用模型的完整示例。
下面是我想到的方法。您会注意到,变量“ trainingDataView”和“ dataProcessPipeline”的代码不完整。这是我整天尝试使用各种方法的代码,但无济于事。在交叉验证阶段,我不断收到错误消息,告诉我找不到目标列。
public static ITransformer CreateClassificationModelExample(MLContext mlContext, DataTable data, List<string> featureColumns, String targetColumn)
{
//I am stuck here. Ideally I would like to see a code snippet to create a IDataView from the DataTable passed in as parameter
//and then selecting only the columns in parameter 'featureColumns' and target = parameter 'targetColumn'
var trainingDataView = ????;
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey(targetColumn, targetColumn)
.Append(mlContext.Transforms.Categorical.OneHotEncoding(ValToKeys))
.Append(mlContext.Transforms.Concatenate("Features", featureSet))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: targetColumn, featureColumnName: "Features")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
var trainingPipeline = dataProcessPipeline.Append(trainer);
// Evaluate quality of Model
var crossValidationResults = mlContext.MulticlassClassification.CrossValidate(trainingDataView, trainingPipeline, numberOfFolds: 5, labelColumnName: targetColumn);
// Train Model
ITransformer model = trainingPipeline.Fit(trainingDataView);
return model;
}
我已经彻底探索了ML.NET文档,包括LoadFromEnumerable method示例。我也查看了有关该主题的ML.NET博客和烹饪书讨论。
请帮助某人的代码段使上述方法起作用,我相信这也会对许多其他人有所帮助!谢谢!
答案 0 :(得分:0)
好吧,经过一天多的努力,尽管还没有完全摆脱编译时的修改,但我还是接近了。下面的代码显示了一个包装器,它或多或少地满足了我的要求,尽管它确实要求在编译时知道NUMBER个模型功能,但这更好,但远非理想。
在下面的示例中,我从DataTable创建一个IDataView,它仅将特定列用作预测变量/特征,并将特定列用作分类模型的Target。然后,代码建立了一个训练分类模型(示例显示“ LbfgsMaximumEntropy”模型),使用交叉验证对其进行评估,然后进行训练。我还展示了一些有关如何创建预测引擎和进行预测的代码。注意,此代码假定您有10个预测变量/特征变量。但这10个很容易更改(如下所示,在“观察”类中显示2行)-比每次您想使用新的数据表进行预测时编写一个类要容易得多。
这是代码。因为我不使用Lambda表达式,所以它有点旧了:
public static ITransformer CreateClassificationModel(MLContext mlContext, DataTable data, List<string> predictorColumns, String TargetColumn, Dictionary<string, int> TargetMapper)
{
//Create instances of the GENERIC class Observation and set the values from the DataTable
//using only the required predictor columns and the target column
List<Observation> observations = new List<Observation>();
int iRow = 0;
foreach (DataRow row in data.Rows)
{
var obs = new Observation();
int iFeature = 0;
foreach (string predictorColumn in predictorColumns)
{
obs.Features[iFeature] = Convert.ToSingle(row[predictorColumn]);
iFeature++;
}
obs.Target = TargetMapper[row[TargetColumn].ToString()];
observations.Add(obs);
iRow++;
}
IEnumerable<Observation> dataNew = observations;
var definedSchema = SchemaDefinition.Create(typeof(Observation));
// Read the data into an IDataView with the modified schema supplied in
IDataView trainingDataView = mlContext.Data.LoadFromEnumerable(observations, definedSchema);
var featureSet = new String[1];
featureSet[0] = "Features";
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Target", "Target")
.Append(mlContext.Transforms.Concatenate("Features", featureSet))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "Target", featureColumnName: "Features")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
IEstimator<ITransformer> trainingPipeline = trainingPipeline = dataProcessPipeline.Append(trainer);
// Evaluate quality of Model
var crossValidationResults = mlContext.MulticlassClassification.CrossValidate(trainingDataView, trainingPipeline, numberOfFolds: 5, labelColumnName: "Target");
// Train Model
ITransformer model = trainingPipeline.Fit(trainingDataView);
return model;
}
要测试/使用此模型,可以使用以下PredictionEngine(代码段):
List<Observation> testData = GetTestDataList(); //Get some test data as Observations
// Create a prediction engine from the model for feeding new data.
var engine = mlContext.Model.CreatePredictionEngine<Observation, ModelOutput>(model);
//Make a prediction. The result is of type Output, class shown below.
var output = engine.Predict(testData[0]);
最后,下面是上述代码中所需的两个类的定义:
public class Observation
{
private float[] m_Features = new Single[10];
[VectorType(10)]
public float[] Features
{
get
{
return m_Features;
}
}
public int Target { get; set; }
}
public class ModelOutput
{
// ColumnName attribute is used to change the column name from
// its default value, which is the name of the field.
[ColumnName("PredictedLabel")]
public Int32 Prediction { get; set; }
public float[] Score { get; set; }
}