Question

我发现的所有ML.Net示例都使用TextLoader通过csv等加载数据。

在没有TextLoader的情况下如何将数据加载到培训师中，

我正在将大量数据流式传输到列表

var pipeline = new LearningPipeline
{
    new Microsoft.ML.Data.TextLoader(_datapath).CreateFrom<Match>(useHeader: true, separator: ','),
    …

是否存在一个采用T []的实现。。从持续的角度来看，将csv文件连续写入磁盘似乎是很多不必要的IO ，尤其是在训练功能锁定文件的情况下。表示每个活动的训练实例有多个文件。

Answer 1

使用现有的LearningPipeline API，CollectionDataSource可用于训练已在内存中的数据：

var pipeline = new LearningPipeline();
var data = new List<IrisData>() {
    new IrisData { SepalLength = 1f, SepalWidth = 1f, PetalLength=0.3f, PetalWidth=5.1f, Label=1},
    new IrisData { SepalLength = 1f, SepalWidth = 1f, PetalLength=0.3f, PetalWidth=5.1f, Label=1},
    new IrisData { SepalLength = 1.2f, SepalWidth = 0.5f, PetalLength=0.3f, PetalWidth=5.1f, Label=0}
};
var collection = CollectionDataSource.Create(data);

pipeline.Add(collection);
pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
    "SepalLength", "SepalWidth", "PetalLength", "PetalWidth"));
pipeline.Add(new StochasticDualCoordinateAscentClassifier());
var model = pipeline.Train<IrisData, IrisPrediction>();

取自here的样本。

随着即将推出的新ML.NET API的出现，这种情况将发生变化，并将提供新的示例来演示如何执行此操作。

注意：我是ML.NET团队的成员。

Answer 2

可能的ml.net train model input from string instead of a file副本

您可以使用在ML.NET 0.2版中引入的CollectionDataSource。

ML.Net数据加载器（内存）

2 个答案: