我正在使用来自nnet软件包的multinom来使逻辑回归模型适用于由3个类组成的数据,但是类的普遍性并不均衡。我想指定权重/惩罚,以告诉模型避免对某个类别进行错误分类。 这是我的代码和我的数据片段:
mnm <- multinom(formula = cut.rank ~ ., data = training.logist, trace = FALSE, maxit = 1000, weights=c(10,5,1))
> str(head(training.logist))
'data.frame': 6 obs. of 15 variables:
$ is_top_rated_listing : Factor w/ 2 levels "0","1": 1 1 1 2 2 2
$ seller_is_top_rated_seller : int 1 1 1 1 1 1
$ is_auto_pay : Factor w/ 2 levels "0","1": 2 2 2 2 2 2
$ is_returns_accepted : Factor w/ 2 levels "0","1": 2 2 2 2 2 2
$ seller_feedback_rating_star : Factor w/ 11 levels "Blue","Green",..: 7 7 7 9 9 9
$ keywords_title_assoc : num 1 1 1 1 1 1
$ normalized.price_shipping : num 0 0 0.00871 0.01853 0.01853 ...
$ normalized.seller_feedback_score : num 0.7117 0.8791 0.0966 0.095 0.095 ...
$ normalized.seller_positive_feedback_percent: num 0.7117 0.8791 0.0966 0.095 0.095 ...
$ item_condition : Factor w/ 2 levels "New","New other (see details)": 1 1 1 1 1 1
$ listing_type : Factor w/ 2 levels "FixedPrice","StoreInventory": 2 2 2 1 1 1
$ best_offer_enabled : Factor w/ 2 levels "0","1": 1 1 1 1 1 1
$ shipping_handling_time : int 10 10 10 1 1 1
$ shipping_locations : Factor w/ 7 levels "AU,Americas,Europe,Asia",..: 5 5 5 5 5 5
$ cut.rank : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1
>
任何人都知道如何分配错误分类处罚?具体来说,我想指定一个10:5:1的惩罚比例(对应1,2,3级),这意味着我真的希望在第1级准确。 我的目标变量cut.rank的分布是~0.04,0.08,0.88。 由于1级流行率较低,因此该类别的模型灵敏度较低。