我正在尝试在我的随机森林代码中测量MAPE(平均绝对百分比误差)值。 MAE值为7.5。当我尝试计算MAPE时,它输出:
Accuracy: -inf %
这是我用于计算MAPE的代码。如何使它起作用,或者为什么它不计算值。
mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
以下是值:
errors: array([ 2.165, 6.398, 2.814, ..., 21.268, 8.746, 11.63 ])
test_labels: array([45, 47, 98, ..., 87, 47, 72])
这些是类型:
var1 int64
var2 int64
var3 float64
var4 int64
var6 float64
var7 int64
var1. float64
dtype: object
示例值,超过8000个条目
var1 var2. var3 var4 var5 var6 var7
"420823370" "183" "2019-09-07 22:13:04" "84" "2019-09-07 22:12:46" "72" "00:00:18"
"420521201" "183" "2019-09-07 17:43:03" "84" "2019-09-07 17:42:51" "46" "00:00:12"
"420219554" "183" "2019-09-07 12:43:02" "88" "2019-09-07 12:42:39" "72" "00:00:23"
"419618820" "183" "2019-09-07 02:43:01" "92" "2019-09-07 02:42:46" "80" "00:00:15"
"419618819" "183" "2019-09-07 02:43:01" "84" "2019-09-07 02:42:46" "80" "00:00:15"
"417193989" "183" "2019-09-05 10:42:52" "82" "2019-09-05 10:42:23" "0" "00:00:29"
"416891691" "183" "2019-09-05 05:42:51" "78" "2019-09-05 05:42:49" "72" "00:00:02"
"416587222" "183" "2019-09-05 00:42:51" "88" "2019-09-05 00:42:35" "99" "00:00:16"
"416587223" "183" "2019-09-05 00:42:51" "82" "2019-09-05 00:42:35" "99" "00:00:16"
"416587224" "183" "2019-09-05 00:42:51" "80" "2019-09-05 00:42:35" "99" "00:00:16"
id:Big Int. ts_tuid: Big Int. rssi: numeric. batl: real. ts_diff:interval
这是代码示例:
从CSV加载数据
model = (
pd.read_csv("source.csv", parse_dates=['var3', 'var5'], date_parser=lambda x: pd.to_datetime(x))
.assign(
rssi_ts=lambda x: x.loc[:, 'var3'].astype(int) / 10 ** 9,
batl_ts=lambda x: x.loc[:, 'var5'].astype(int) / 10 ** 9,
ts_diff=lambda x: pd.to_timedelta(x.loc[:, 'ts_diff']).astype(int) / 10 ** 9
)
)
# Labels are the values we want to predict
labels_b = np.array(halti['var4'])
# Remove the labels from the features
# axis 1 refers to the columns
features_r = halti.drop('var4', axis = 1)
features_r2 = list(features_r.columns)
# Convert to numpy array
features_r = np.array(features_r)
# Using Skicit-learn to split data into training and testing sets
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(features_r, labels_b, test_size = 0.25, random_state = 42)
# Import the model we are using
from sklearn.ensemble import RandomForestRegressor
# Instantiate model with 1000 decision trees
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
# Train the model on training data
rf.fit(train_features, train_labels);
# Use the forest's predict method on the test data
predictions = rf.predict(test_features)
# Calculate the absolute errors
errors = abs(predictions - test_labels)
# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')
mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
答案 0 :(得分:2)
您收到此错误消息是因为当测试标签为0(这是几个shortcomings of using MAPE之一)时,MAPE尚未定义。如果将accuracy = 100 - np.mean(mape)
替换为accuracy = 100 - np.mean(mape[np.isfinite(mape)])
,则会得到一个更合理的数字。
答案 1 :(得分:0)
这次输出显示映射错误度量中的 Inf。其背后的原因是我们在观察值中有零。当因变量可以将零作为输出之一时,我们不能使用 mape 作为误差度量。在这种情况下,应使用其他错误措施。
参考:https://rstudio-pubs-static.s3.amazonaws.com/390751_f6b763e827b24c9cb4406cd43129c8a9.html