使用剪切函数

时间:2017-02-12 14:49:10

标签: r dataframe cut

我尝试使用下面的代码为每个观察附加一个十分位数值。但是,似乎值不正确。可能是什么原因?

     df<-read.table(text="pregnant glucose blood skin INSULIN MASS  DIAB AGE CLASS  predict_probability 
                                  1     106    70   28     135 34.2 0.142  22     0       0.15316285       
                                  1      91    54   25     100 25.2 0.234  23     0       0.05613959       
                                  4     136    70    0       0 31.2 1.182  22     1       0.54034794       
                                  9     164    78    0       0 32.8 0.148  45     1       0.64361578       
                                  3     173    78   39     185 33.8 0.970  31     1       0.79185196       
                                 11     136    84   35     130 28.3 0.260  42     1       0.31927737       
                                  0     141    84   26       0 32.4 0.433  22     0       0.41609308       
                                  3     106    72    0       0 25.8 0.207  27     0       0.10460090       
                                  9     145    80   46     130 37.9 0.637  40     1       0.67061324       
                                 10     111    70   27       0 27.5 0.141  40     1       0.16152296       
                       ",header=T)

deciles <- cut(df$predict_probability, breaks=c(quantile(df$predict_probability, probs = seq(0, 1, by = 0.10))),labels = 1:10, include.lowest=TRUE)
        df1 <- cbind(df,deciles)
        head(df1,10)
           pregnant glucose blood skin INSULIN MASS  DIAB AGE CLASS predict_probability deciles
        1         1     106    70   28     135 34.2 0.142  22     0          0.15316285       3
        2         1      91    54   25     100 25.2 0.234  23     0          0.05613959       1
        3         4     136    70    0       0 31.2 1.182  22     1          0.54034794       7
        4         9     164    78    0       0 32.8 0.148  45     1          0.64361578       8
        5         3     173    78   39     185 33.8 0.970  31     1          0.79185196      10
        6        11     136    84   35     130 28.3 0.260  42     1          0.31927737       5
        7         0     141    84   26       0 32.4 0.433  22     0          0.41609308       6
        8         3     106    72    0       0 25.8 0.207  27     0          0.10460090       2
        9         9     145    80   46     130 37.9 0.637  40     1          0.67061324       9
        10       10     111    70   27       0 27.5 0.141  40     1          0.16152296       4

1 个答案:

答案 0 :(得分:0)

根据Dason的提议,这里是问题的完整答案。 quantile函数应从代码中删除,因此seq(0,1,by=0.1)应直接传递给cut函数。

    deciles <- cut(df$predict_probability, seq(0,1,by=0.1) ,labels = 1:10, include.lowest=TRUE)
    df1 <- cbind(df,deciles)
    head(df1,10)
 pregnant glucose blood skin INSULIN MASS  DIAB AGE CLASS predict_probability deciles
1         1     106    70   28     135 34.2 0.142  22     0          0.15316285       2
2         1      91    54   25     100 25.2 0.234  23     0          0.05613959       1
3         4     136    70    0       0 31.2 1.182  22     1          0.54034794       6
4         9     164    78    0       0 32.8 0.148  45     1          0.64361578       7
5         3     173    78   39     185 33.8 0.970  31     1          0.79185196       8
6        11     136    84   35     130 28.3 0.260  42     1          0.31927737       4
7         0     141    84   26       0 32.4 0.433  22     0          0.41609308       5
8         3     106    72    0       0 25.8 0.207  27     0          0.10460090       2
9         9     145    80   46     130 37.9 0.637  40     1          0.67061324       7
10       10     111    70   27       0 27.5 0.141  40     1          0.16152296       2