Question

我有一个DecisionTree对象来创建机器学习模型。 DecisionTree中有很多字段代表设置。每个字段都有一个默认值，在大多数情况下，只需要更改其中一个或两个字段。

问题是，DecisionTree实际构建的计算成本很高。因此，不是在制作模型时构建模型，而是让制造商只解析并保存数据。在调用DecisionTree.build之前，不会构建模型。这允许在构建之前更改设置。但是，它也意味着如果在构建之前调用DecisionTree.predict将失败。

我知道让对象始终处于有效状态是一种好习惯。但这意味着在构造函数中构建树，这很昂贵，然后如果更改了任何设置，则必须再次构建它。

示例1：构建呼叫是单独的

DecisionTree tree = new DecisionTree(data, classes, attributes);

tree.predict(item); //This would error

tree.maxDepth = 15;
tree.infoGain = 0.5;
tree.build();

tree.predict(item) // Now it would work

示例2：包含构建调用，设置不在构造函数

中

DecisionTree tree = new DecisionTree(data, classes, attributes); // This would take a long time to complete

tree.predict(item); //This would now work

tree.maxDepth = 15;
tree.infoGain = 0.5;
tree.build(); // This would once again take a long time to complete

tree.predict(item) // Done, but takes twice as long as the previous example

示例3

DecisionTree tree = new DecisionTree(data, classes, attributes, null, null, 15, null, null, 0.5, null, null, null); // Settings are all included in constructor

tree.predict(item); //This would immediately be callable

我的问题是，这3个选项是处理许多设置的唯一方法吗？这是什么标准/最佳做法？

Answer 1

我认为用其他方法拟合算法是不好的做法，例如在scikit-learn查找，它们提供额外的methid来拟合对象，构造函数本身只是初始化内部变量，如果你调用预测在适合之前它只会抛出NotFittedError。除此之外，将来你可能希望扩展你的算法，例如使用minibatches，在这种情况下，不可能多次调用构造函数，因此你需要类似partial_fit方法的东西，以适应分类器额外的数据块。所以你不能在构造函数中做所有事情。

如果初始化中有大量参数，可能会发现有用的Builder pattern

设置需要构建的对象中的字段

示例1：构建呼叫是单独的

示例2：包含构建调用，设置不在构造函数

示例3

1 个答案:

设置需要构建的对象中的字段

示例1：构建呼叫是单独的

示例2：包含构建调用，设​​置不在构造函数

示例3

1 个答案:

示例2：包含构建调用，设置不在构造函数