如何在运行期间如何使用ML.Net指定Kmeans聚类中的要素数量(矢量类型)

时间:2019-05-02 15:40:30

标签: c# .net cluster-analysis ml.net

我想使用ML.Net Kmeans算法,但是在编译期间我不知道数据集的大小,也就是要素数量。

我看到向量类型的长度应该是一个const,因此尝试作为参数传递将不起作用。

class Data
{ 
    public string ID{ get; set; }

    [VectorType(5)] //I do not know the if the data will contain 5 or more features
    public float[] Features { get; set; }   
}

要使用:

InputData row = new InputData { AssetID = Data[0, i + 1].ToString(), Features = features };

var context = new MLContext();
var DataView = context.Data.LoadFromEnumerable(dataArray);
string featuresColumnName = "Features";
var pipeline=context.Transforms.Concatenate(featuresColumnName,"Features")             .Append(context.Clustering.Trainers.KMeans(featuresColumnName, clustersCount: NumberClusters));

var model = pipeline.Fit(DataView);

1 个答案:

答案 0 :(得分:0)

如果向量的尺寸是固定的,则可以在运行时解决:

 private class SampleTemperatureDataVector
    {
        public DateTime Date { get; set; }
        public float[] Temperature { get; set; }
    }

注意此类型没有注释。您可以从中创建SchemaDefinition,而不是修改该架构。初始SchemaDefinition将IsKnownSize属性设置为false。修改后,Size将设置为您设置的尺寸,在这种情况下为3。

        var data2 = new SampleTemperatureDataVector[]
        {
            new SampleTemperatureDataVector
            {
                Date = DateTime.UtcNow, 
                Temperature = new float[] {1.2f, 3.4f, 5.6f}
            },
             new SampleTemperatureDataVector
            {
                Date = DateTime.UtcNow,
                Temperature = new float[] {1.2f, 3.4f, 5.6f}
            },
        };

        int featureDimension = 3;
        var autoSchema = SchemaDefinition.Create(typeof(SampleTemperatureDataVector));
        var featureColumn = autoSchema[1];
        var itemType = ((VectorDataViewType)featureColumn.ColumnType).ItemType;
        featureColumn.ColumnType = new VectorDataViewType(itemType, featureDimension);

        IDataView data3 = mlContext.Data.LoadFromEnumerable(data2, autoSchema);