Question

我想在R中使用BoxCoxTrans函数来解决偏斜问题。

但是，我有一个无法将结果作为数据框的问题。这是我的R代码。

df<-read.csv("dataSetNA1.csv",header=TRUE)

dd1<-apply(df[2:61],2,BoxCoxTrans) #Except independent variable that located first column, All variables are numeric variable.

dd1
$LT1Y_MXOD_AMT
Box-Cox Transformation

96249 data points used to estimate Lambda

Input data summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0   19594       0 1600000 

Lambda could not be estimated; no transformation is applied


$MOBL_PRIN
Box-Cox Transformation

96249 data points used to estimate Lambda

Input data summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0  100000  191229  320000 1100000 

Lambda could not be estimated; no transformation is applied

str(dd1)

我不知道如何将结果作为数据框架。

如果我使用as.data.frame函数，则会发布此错误消息。

dd2<-as.data.frame(dd1)
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  클래스  ""BoxCoxTrans""를 data.frame으로 강제형변환 할 수 없습니다

请帮帮我。

Answer 1

以下是实现目标的一种方法（我假设您正在改变这些功能）：

library(caret)
data(cars)

#create a list with the BoxCox objects
g <- apply(cars, 2, BoxCoxTrans)

#use map2 from purr to apply the models to new data

z <- purrr::map2(g, cars, function(x, y) predict(x, y)) 

#here the transformation is performed on the same data on 
#which I estimated the BoxCox lambda for

B_trans = as.data.frame(do.call(cbind, z)) #to convert to data frame

head(data.frame(B_trans, cars), 20)
#outpout

   speed       dist speed.1 dist.1
1      4  0.8284271       4      2
2      4  4.3245553       4     10
3      7  2.0000000       7      4
4      7  7.3808315       7     22
5      8  6.0000000       8     16
6      9  4.3245553       9     10
7     10  6.4852814      10     18
8     10  8.1980390      10     26
9     10  9.6619038      10     34
10    11  6.2462113      11     17
11    11  8.5830052      11     28
12    12  5.4833148      12     14
13    12  6.9442719      12     20
14    12  7.7979590      12     24
15    12  8.5830052      12     28
16    13  8.1980390      13     26
17    13  9.6619038      13     34
18    13  9.6619038      13     34
19    13 11.5646600      13     46
20    14  8.1980390      14     26

前两列是转换数据，第二列是原始数据。

另一种方法是在培训期间纳入功能转换：

train(....preProcess = "BoxCox"...)

更多关于此事：https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/train

Answer 2

BoxCox转换是对您的响应变量的转换。您可以使用MASS包的Boxcox功能来找出需要的转换。 Boxcox返回一个lambda值。 U应该将你的响应（例如y）提高到幂lambda，这会产生一个新的响应变量y *。然后用y *替换旧数据框中的y列。

请注意，如果得到的lambda为0，则应该应用对数变换ln（y）。

Answer 3

要执行Box Cox转换，您的数据必须为正。因此，值应大于0 。其原因是，对数0为-Inf。

如果您的数据包含0值，则可以将每个观察值加1。这不会改变您的分布/偏度。

如何在R中使用BoxCoxTrans函数？

3 个答案: