我正在学习Microsoft ML框架,并困惑为什么需要串联功能。在Microsoft的鸢尾花示例中,这里: https://docs.microsoft.com/en-us/dotnet/machine-learning/tutorials/iris-clustering
...功能是串联的:
string featuresColumnName = "Features";
var pipeline = mlContext.Transforms
.Concatenate(featuresColumnName, "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
...
为了进行线性回归之类的计算,是否将多个要素视为一个要素?如果是这样,这如何准确?幕后发生了什么?
答案 0 :(得分:0)
串联是必要的,因为训练员将特征向量视为 输入。
本质上,它以独立列的形式将特征转换为特征向量的单列。特征值本身保持不变。仅更改其格式和类型。通过此example更清楚:
转换前:
var samples = new List<InputData>()
{
new InputData(){ Feature1 = 0.1f, Feature2 = new[]{ 1.1f, 2.1f,
3.1f }, Feature3 = 1 },
new InputData(){ Feature1 = 0.2f, Feature2 = new[]{ 1.2f, 2.2f,
3.2f }, Feature3 = 2 },
new InputData(){ Feature1 = 0.3f, Feature2 = new[]{ 1.3f, 2.3f,
3.3f }, Feature3 = 3 },
new InputData(){ Feature1 = 0.4f, Feature2 = new[]{ 1.4f, 2.4f,
3.4f }, Feature3 = 4 },
new InputData(){ Feature1 = 0.5f, Feature2 = new[]{ 1.5f, 2.5f,
3.5f }, Feature3 = 5 },
new InputData(){ Feature1 = 0.6f, Feature2 = new[]{ 1.6f, 2.6f,
3.6f }, Feature3 = 6 },
};
之后:
// "Features" column obtained post-transformation.
// 0.1 1.1 2.1 3.1 1
// 0.2 1.2 2.2 3.2 2
// 0.3 1.3 2.3 3.3 3
// 0.4 1.4 2.4 3.4 4
// 0.5 1.5 2.5 3.5 5
// 0.6 1.6 2.6 3.6 6