Pyspark:随机森林特征子集策略不接受int或float

时间:2018-09-11 14:19:24

标签: python pyspark random-forest

我正在使用pyspark构建随机森林分类器。我想将featureSubsetStrategy设置为数字而不是autosqrt等。文档指出:

featureSubsetStrategy = Param(parent='undefined', name='featureSubsetStrategy', doc='The number of features to consider for splits at each tree node. Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].')

但是,例如,当我选择诸如0.2之类的数字时,会出现以下错误:

TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type

如果我使用featureSubsetStrategy=5,也会发生同样的情况。您如何设置它,使其可以是int或float?

示例:

# setting target label
label_col = 'veh_pref_Economy'

# random forest parameters
max_depth = 2
subset_strategy = 0.2037
impurity = 'gini'
min_instances_per_node = 41
num_trees = 1
seed = 1246

rf_econ_gen = (RandomForestClassifier()
                 .setLabelCol(label_col)
                 .setFeaturesCol("features")
                 .setMaxDepth(max_depth)
                 .setFeatureSubsetStrategy(subset_strategy)
                 .setImpurity(impurity)
                 .setMinInstancesPerNode(min_instances_per_node)
                 .setNumTrees(num_trees)
                 .setSeed(seed))

这将返回:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
    418                 try:
--> 419                     value = p.typeConverter(value)
    420                 except TypeError as e:

~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in toString(value)
    203         else:
--> 204             raise TypeError("Could not convert %s to string type" % type(value))
    205 

TypeError: Could not convert <class 'float'> to string type

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-28-71b9c2a0f1a0> in <module>()
      3                  .setFeaturesCol("features")
      4                  .setMaxDepth(max_depth)
----> 5                  .setFeatureSubsetStrategy(subset_strategy)
      6                  .setImpurity(impurity)
      7                  .setMinInstancesPerNode(min_instances_per_node)

~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/regression.py in setFeatureSubsetStrategy(self, value)
    632         Sets the value of :py:attr:`featureSubsetStrategy`.
    633         """
--> 634         return self._set(featureSubsetStrategy=value)
    635 
    636     @since("1.4.0")

~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
    419                     value = p.typeConverter(value)
    420                 except TypeError as e:
--> 421                     raise TypeError('Invalid param value given for param "%s". %s' % (p.name, e))
    422             self._paramMap[p] = value
    423         return self

TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type

0 个答案:

没有答案