我想使用ML.Net Kmeans算法,但是在编译期间我不知道数据集的大小,也就是要素数量。
我看到向量类型的长度应该是一个const,因此尝试作为参数传递将不起作用。
class Data
{
public string ID{ get; set; }
[VectorType(5)] //I do not know the if the data will contain 5 or more features
public float[] Features { get; set; }
}
要使用:
InputData row = new InputData { AssetID = Data[0, i + 1].ToString(), Features = features };
var context = new MLContext();
var DataView = context.Data.LoadFromEnumerable(dataArray);
string featuresColumnName = "Features";
var pipeline=context.Transforms.Concatenate(featuresColumnName,"Features") .Append(context.Clustering.Trainers.KMeans(featuresColumnName, clustersCount: NumberClusters));
var model = pipeline.Fit(DataView);
答案 0 :(得分:0)
如果向量的尺寸是固定的,则可以在运行时解决:
private class SampleTemperatureDataVector
{
public DateTime Date { get; set; }
public float[] Temperature { get; set; }
}
注意此类型没有注释。您可以从中创建SchemaDefinition,而不是修改该架构。初始SchemaDefinition将IsKnownSize
属性设置为false
。修改后,Size
将设置为您设置的尺寸,在这种情况下为3。
var data2 = new SampleTemperatureDataVector[]
{
new SampleTemperatureDataVector
{
Date = DateTime.UtcNow,
Temperature = new float[] {1.2f, 3.4f, 5.6f}
},
new SampleTemperatureDataVector
{
Date = DateTime.UtcNow,
Temperature = new float[] {1.2f, 3.4f, 5.6f}
},
};
int featureDimension = 3;
var autoSchema = SchemaDefinition.Create(typeof(SampleTemperatureDataVector));
var featureColumn = autoSchema[1];
var itemType = ((VectorDataViewType)featureColumn.ColumnType).ItemType;
featureColumn.ColumnType = new VectorDataViewType(itemType, featureDimension);
IDataView data3 = mlContext.Data.LoadFromEnumerable(data2, autoSchema);