R为行创建一个有序的五分位数

时间:2018-04-18 10:14:05

标签: r row ranking

我有一个数据框,我想将列分类,跨行,属于第一,第二,第三,第四或第五个qunitile(我知道有点混乱,但示例应该澄清)。我已经这样做了,但问题在于,首先并非所有因子水平都存在于每个变量中,其次,因子不是以最合理的方式排序的。以下是一些测试数据。

x.df<-structure(list(Location = structure(1:6, .Label = c("Site A", 
"Site B", "Site C", "Site D", "Site E", "Site F"), class = "factor"), 
Var1 = c(78L, 5L, 85L, 87L, 89L, 82L), Var2 = c(98L, 5L, 
67L, 92L, 3L, 44L), Var3 = c(30L, 54L, 56L, 3L, 31L, 58L), 
Var4 = c(63L, 96L, 14L, 95L, 90L, 99L), Var5 = c(71L, 52L, 
78L, 93L, 74L, 26L), Var6 = c(21L, 66L, 57L, 42L, 39L, 69L
), Var7 = c(97L, 42L, 84L, 46L, 86L, 46L), Var8 = c(100L, 
99L, 6L, 41L, 94L, 20L), Var9 = c(84L, 82L, 26L, 91L, 38L, 
80L), Var10 = c(8L, 50L, 23L, 92L, 46L, 1L)), .Names = c("Location",
"Var1", "Var2", "Var3", "Var4", "Var5", "Var6", "Var7", "Var8", 
"Var9", "Var10"), class = "data.frame", row.names = c(NA, -6L
))

cut_fn<-function(x){cut(x,quantile(x, c(0.0,0.2,0.4,0.6,0.8,1.0)),include.lowest=TRUE, c("lowest","low","middle","high","highest"))}
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)))

因此,每一行都有两个&#34;最高&#34;,&#34;高&#34;,...,&#34;最低&#34;。

r.df
       X1      X2     X3      X4      X5     X6      X7      X8     X9    X10
1  middle highest    low     low  middle lowest    high highest   high lowest
2  lowest  lowest middle highest  middle   high     low highest   high    low
3 highest    high middle  lowest    high middle highest  lowest    low    low
4  middle    high lowest highest highest    low     low  lowest middle   high
5    high  lowest lowest highest  middle    low    high highest    low middle
6 highest     low middle highest     low   high  middle  lowest   high lowest
str(r.df)
'data.frame':   6 obs. of  10 variables:
 $ X1 : Factor w/ 4 levels "high","highest",..: 4 3 2 4 1 2
 $ X2 : Factor w/ 4 levels "high","highest",..: 2 4 1 1 4 3
 $ X3 : Factor w/ 3 levels "low","lowest",..: 1 3 3 2 2 3
 $ X4 : Factor w/ 3 levels "highest","low",..: 2 1 3 1 1 1
 $ X5 : Factor w/ 4 levels "high","highest",..: 4 4 1 2 4 3
 $ X6 : Factor w/ 4 levels "high","low","lowest",..: 3 1 4 2 2 1
 $ X7 : Factor w/ 4 levels "high","highest",..: 1 3 2 3 1 4
 $ X8 : Factor w/ 2 levels "highest","lowest": 1 1 2 2 1 2
 $ X9 : Factor w/ 3 levels "high","low","middle": 1 1 2 3 2 1
 $ X10: Factor w/ 4 levels "high","low","lowest",..: 3 2 2 1 4 3

理想情况下,我所喜欢的是具有(有序)结构的所有变量:

 $ X1 : Factor w/ 5 levels "highest","high",..: 

1 个答案:

答案 0 :(得分:1)

如果我正确理解您的问题,您希望订购每列。这种最简单的方法是循环遍历所有使用因子函数转换它们的列,并使用ordered = TRUE选项 试试这个:

#first create r.df with stringsAsFactors as false
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)), stringsAsFactors = FALSE)

#now loop across all of the columns creating an order factor list
#lowest=1 while highest =5
for(x in names(r.df)) {
  r.df[[x]]<-factor(r.df[[x]], levels=c("lowest","low","middle","high","highest"), ordered=TRUE)}
}

现在每列将按正确的顺序排列所有五个级别。