随机森林 - 如何将类别转换为随机森林预测的值?

时间:2021-02-23 02:08:17

标签: pandas dataframe scikit-learn random-forest

目标是用随机森林预测这个数据集的价格。

+---------+--------+--------+-------+------------+
|         | weight | color  | price |            |
+---------+--------+--------+-------+------------+
| 1       | 2      | blue   | 20    |  = 2 x 10  |
+---------+--------+--------+-------+------------+
| 2       | 2      | red    | 60    |  = 2 x 30  |
+---------+--------+--------+-------+------------+
| 3       | 3      | blue   | 30    |  = 3 x 10  |
+---------+--------+--------+-------+------------+
| 4       | 1      | yellow | 5     |  = 1 x 5   |
+---------+--------+--------+-------+------------+
| ...     | ...    | ...    | ...   | ...        |
+---------+--------+--------+-------+------------+
| 1200000 | 4      | blue   | 40    |  = 4 x 10  |
+---------+--------+--------+-------+------------+

首先将颜色列中的字符串转换为整数值。

+---+--------+-----+
|   | color  | int |
+---+--------+-----+
| 1 | yellow | 1   |
+---+--------+-----+
| 2 | blue   | 2   |
+---+--------+-----+
| 3 | red    | 3   |
+---+--------+-----+
| 4 | ...    | ... |
+---+--------+-----+

所以数据集应该是这样的:

+---------+--------+--------+-------+------------+
|         | weight | color  | price |            |
+---------+--------+--------+-------+------------+
| 1       | 2      |    2   | 20    |  = 2 x 10  |
+---------+--------+--------+-------+------------+
| 2       | 2      |    3   | 60    |  = 2 x 30  |
+---------+--------+--------+-------+------------+
| 3       | 3      |    2   | 30    |  = 3 x 10  |
+---------+--------+--------+-------+------------+
| 4       | 1      |    1   | 5     |  = 1 x 5   |
+---------+--------+--------+-------+------------+
| ...     | ...    | ...    | ...   | ...        |
+---------+--------+--------+-------+------------+
| 1200000 | 4      |    2   | 40    |  = 4 x 10  |
+---------+--------+--------+-------+------------+

这是对它们进行分类的正确方法吗? 然后随机森林必须预测测试集中的价格列。 随机森林算法如何理解其中一些值比其他值多?

data = df.drop('price', axis = 1)
data = np.array(data)

train_X, test_X, train_y, test_y = train_test_split(data, test_size = 0.25, random_state = 42)

rf = RandomForestRegressor(n_estimators= 1000, random_state=42)

rf.fit(train_X, train_y);
predictions = rf.predict(test_X)

0 个答案:

没有答案