我有一个数据框,我想将列分类,跨行,属于第一,第二,第三,第四或第五个qunitile(我知道有点混乱,但示例应该澄清)。我已经这样做了,但问题在于,首先并非所有因子水平都存在于每个变量中,其次,因子不是以最合理的方式排序的。以下是一些测试数据。
x.df<-structure(list(Location = structure(1:6, .Label = c("Site A",
"Site B", "Site C", "Site D", "Site E", "Site F"), class = "factor"),
Var1 = c(78L, 5L, 85L, 87L, 89L, 82L), Var2 = c(98L, 5L,
67L, 92L, 3L, 44L), Var3 = c(30L, 54L, 56L, 3L, 31L, 58L),
Var4 = c(63L, 96L, 14L, 95L, 90L, 99L), Var5 = c(71L, 52L,
78L, 93L, 74L, 26L), Var6 = c(21L, 66L, 57L, 42L, 39L, 69L
), Var7 = c(97L, 42L, 84L, 46L, 86L, 46L), Var8 = c(100L,
99L, 6L, 41L, 94L, 20L), Var9 = c(84L, 82L, 26L, 91L, 38L,
80L), Var10 = c(8L, 50L, 23L, 92L, 46L, 1L)), .Names = c("Location",
"Var1", "Var2", "Var3", "Var4", "Var5", "Var6", "Var7", "Var8",
"Var9", "Var10"), class = "data.frame", row.names = c(NA, -6L
))
cut_fn<-function(x){cut(x,quantile(x, c(0.0,0.2,0.4,0.6,0.8,1.0)),include.lowest=TRUE, c("lowest","low","middle","high","highest"))}
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)))
因此,每一行都有两个&#34;最高&#34;,&#34;高&#34;,...,&#34;最低&#34;。
r.df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 middle highest low low middle lowest high highest high lowest
2 lowest lowest middle highest middle high low highest high low
3 highest high middle lowest high middle highest lowest low low
4 middle high lowest highest highest low low lowest middle high
5 high lowest lowest highest middle low high highest low middle
6 highest low middle highest low high middle lowest high lowest
str(r.df)
'data.frame': 6 obs. of 10 variables:
$ X1 : Factor w/ 4 levels "high","highest",..: 4 3 2 4 1 2
$ X2 : Factor w/ 4 levels "high","highest",..: 2 4 1 1 4 3
$ X3 : Factor w/ 3 levels "low","lowest",..: 1 3 3 2 2 3
$ X4 : Factor w/ 3 levels "highest","low",..: 2 1 3 1 1 1
$ X5 : Factor w/ 4 levels "high","highest",..: 4 4 1 2 4 3
$ X6 : Factor w/ 4 levels "high","low","lowest",..: 3 1 4 2 2 1
$ X7 : Factor w/ 4 levels "high","highest",..: 1 3 2 3 1 4
$ X8 : Factor w/ 2 levels "highest","lowest": 1 1 2 2 1 2
$ X9 : Factor w/ 3 levels "high","low","middle": 1 1 2 3 2 1
$ X10: Factor w/ 4 levels "high","low","lowest",..: 3 2 2 1 4 3
理想情况下,我所喜欢的是具有(有序)结构的所有变量:
$ X1 : Factor w/ 5 levels "highest","high",..:
答案 0 :(得分:1)
如果我正确理解您的问题,您希望订购每列。这种最简单的方法是循环遍历所有使用因子函数转换它们的列,并使用ordered = TRUE选项 试试这个:
#first create r.df with stringsAsFactors as false
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)), stringsAsFactors = FALSE)
#now loop across all of the columns creating an order factor list
#lowest=1 while highest =5
for(x in names(r.df)) {
r.df[[x]]<-factor(r.df[[x]], levels=c("lowest","low","middle","high","highest"), ordered=TRUE)}
}
现在每列将按正确的顺序排列所有五个级别。