ML.NET IDataView返回csv

时间:2019-06-05 16:57:15

标签: c# csv machine-learning ml.net

假设我有以下示例数据:

Sample.csv:

Dog,25
Cat,23
Cat,20
Dog,0

我想将其加载到IDataView,将其转换为可用于ML(没有字符串等),然后再次将其另存为.csv,说用另一种工具进行分析或语言。

// Load data:
var sampleCsv = Path.Combine("Data", "Sample.csv");
var columns = new[]
{
    new TextLoader.Column("type", DataKind.String, 0),
    new TextLoader.Column("age", DataKind.Int16, 1),
};
var mlContext = new MLContext(seed: 0);
var dataView = mlContext.Data.LoadFromTextFile(sampleCsv, columns,',');

// Transform
var pipeline =
    mlContext.Transforms.Categorical.OneHotEncoding("type",
        // This outputKind will add just one column, while others will add some:
        outputKind: OneHotEncodingEstimator.OutputKind.Key);
var transformedDataView = pipeline.Fit(dataView).Transform(dataView);
//  transformedDataView:
//  Dog,1,25
//  Cat,2,23
//  Cat,2,20
//  Dog,1,0

如何获取两个数字列并将其写入.csv文件?

2 个答案:

答案 0 :(得分:1)

我在自己的项目中使用以下代码创建一个 .csv 文件。希望这会有所帮助。

var predictions = mlContext.Data.CreateEnumerable<SpikePrediction>(transformedData, reuseRowObject: false);

SavePredictions(predictions.ToArray());

private void SavePredictions(SpikePrediction[] predictions) {
if (dict.Count() != predictions.Count()) {
    Console.WriteLine("> Cannot save predictions because it does not correspond with the dataset length");
    return;
}
List<string> predictionsCol = _dataCol.ToList();
predictionsCol.Add("Label");

var fullResultFilePath = Path.Combine(_dataPath, FileHandeling.resultFolder, $"{_modelName}.csv");
using (var stream = File.CreateText(fullResultFilePath)) {
    stream.WriteLine(string.Join(",", predictionsCol));
    for (int i = 0; i < predictions.Count(); i++) {
        var label = predictions[i];
        stream.WriteLine(string.Join(",", new string[] { dict[i].Item1.Split("T")[0].Substring(1), dict[i].Item2, label.Prediction[0].ToString() }));
    }
}
}

答案 1 :(得分:0)

您可以为输出数据创建class

class TempOutput
{
    // Note that the types should be the same from the DataView
    public UInt32 type { get; set; }
    public Int16 age { get; set; }
}

然后使用CreateEnumerable<>DataView读取所有行并将它们打印到`.csv。文件:

File.WriteAllLines(sampleCsv + ".output",
    mlContext.Data.CreateEnumerable<TempOutput>(transformedDataView, false)
    .Select(t => string.Join(',', t.type, t.age)));