Question

我目前正在对 R 中的变量进行分组：手动将字符变量，数字（连续）乘以相等的人口百分比。

对于相同的人口百分比，我使用cut2(var, number_of_bins)。我有像var=TotalPaid/TotalDue这样的连续变量，它们具有如下特殊值：

if TotalPaid AND TotalDue are 0 then var = 999 # Neither have paid nor have anything due
else if TotalPaid = 0 then var = 998 # Have Due but haven't paid anything
else if TotalDue = 0 then var = 997 # Have Paid but the due is 0

我的目标是使用cut2并拆分不基于任何特殊值的相等组（例如分别具有所有特殊值，然后将变量的其余部分拆分为组）示例结果var分组值（如果我决定将变量拆分为 5％的人口）：

**Value**            **%pop**

0                x% of population
Range1           5% of population
Range2           5% of population
...              5% of population
999              y% of population
998              z% of population
997              p% of population

注意：实际上0不是有效值，因为上面的例子中特殊值的编码方式;我只是为了这个例子而把它包括在内）

可重复的例子：

###Data
x<-structure(list(PayCurrMonth_CurrMPV = c(1, 1, 1, 1.1111111111, 
999, 4.7619047619, 6.1407407407, 1, 1, 1, 1, 997, 1, 2.9666666667, 
1, 1.1666666667, 1, 998, 998, 1, 1, 1, 1, 1, 1.0256410256, 998, 
3.3333333333, 6.5, 5, 1, 1, 5363.6363636, 998, 1.0416666667, 
1, 1, 998, 999, 329.34508816, 1, 4, 998, 1, 1, 1, 998, 999, 2.5, 
999, 1, 998, 1, 1, 1, 1, 1.1111111111, 1, 997, 997, 2, 1, 1, 
1, 6, 999, 1, 1.037037037, 3.962962963, 1, 1, 1, 999, 7.9333333333, 
1.2820512821, 1, 1.3333333333, 1, 7.3620273532, 1, 1, 1, 1.5833333333, 
998, 2.8333333333, 1.1111111111, 10.21751051, 998, 2, 1, 997, 
1, 1, 1, 1, 5.3333333333, 2.5166666667, 1, 1, 1.0833333333, 1, 
1, 7.0024444444, 1, 0.8333333333, 999, 1.3333333333, 1, 1, 1, 
629.7, 0.4, 1, 1, 1, 998, 1, 998, 1, 3.001322314, 1, 1, 1, 1, 
1, 997, 0.825, 1, 1, 999, 1, 1, 338.15789474, 998, 1, 1, 1, 1, 
1.0833333333, 1, 1.1111111111, 1, 1.7047619048, 0.8333333333, 
998, 1, 1, 1, 999, 1, 4.5071666667, 1.1111111111, 1, 998, 1, 
1, 1, 1, 0.2941666667, 3, 2.6666666667, 3.5816618911, 1, 998, 
1, 1, 1, 1, 997, 1, 1, 1, 1, 1.06, 997, 1, 2, 1.3333333333, 3.2222222222, 
4.7555555556, 999, 1, 1, 1, 1, 1, 1, 1, 1, 999, 1, 3.3333333333, 
1, 1.6666666667, 1, 1, 1, 1, 1, 1.3888888889, 1, 4.5714285714, 
2.0952380952, 1, 1, 999, 1, 998, 1.1111111111, 1, 1, 1, 999, 
1, 8.8933333333, 1.0666666667, 1, 1.0666666667, 998, 1, 1, 2.5, 
1, 115.77998197, 997, 1, 997, 1, 2, 7.5555555556, 2.6666666667, 
1.1666666667, 1, 999, 2.4, 1.6666666667, 2.1111111111, 2.1111111111, 
998, 2, 998, 1.0833333333, 1, 1, 1, 50, 1.0533333333, 1, 2, 1, 
0.303030303, 1, 1.1111111111, 6.7066666667, 998, 1, 6.6666666667, 
2, 1)), .Names = "PayCurrMonth_CurrMPV", row.names = c(NA, -258L
), class = "data.frame")

    ###split data into special and non special values
    x1<-subset(x,PayCurrMonth_CurrMPV %in% c(997,998,999,1))
    x2<-subset(x,!PayCurrMonth_CurrMPV %in% c(997,998,999,1))

    ###apply equal % of pop only to non special values
    x2$PayCurrMonth_CurrMPV<-cut2(x2$PayCurrMonth_CurrMPV, m = floor( ( 5 / 100 ) * nrow( x2 ) ) )

###combine back special and non special values to form-back the variable - now grouped
x_all<-rbind(x1,x2)

这是我到目前为止所得到的

z<-x[,1] %in% c(997,998,999,1)
f<-cut2(x$PayCurrMonth_CurrMPV[!z], m = floor( ( 5 / 100 ) * nrow( x )  ))
x$PayCurrMonth_CurrMPV[!z]<-as.character(f)

任何有聪明想法的人如何轻松做到这一点？

提前致谢

在分组到R

0 个答案: