如何根据变量中的值和另一个变量中的值来对变量中的级别进行排序?

时间:2013-06-13 10:10:00

标签: r sorting r-factor

我有一个看起来像这样的数据框,我正在准备ggplot:

txt <- "v1 v2 v3
'Strongly agree' 83.1 var1
'Agree' 14.9 var1
'Disagree' 1.5 var1
'Strongly disagree' 0.6 var1
'Strongly agree' 11.8 var2
'Agree' 36.5 var2
'Disagree' 17.7 var2
'Strongly disagree' 43.8 var2
'Strongly agree' 19.6 var3
'Agree' 12 var3
'Disagree' 31.6 var3
'Strongly disagree' 36.8 var3"

mydata <- read.table(textConnection(txt), sep = " ", header = TRUE)

我的问题是:如何根据mydata$v3中的值和mydta$v2中的值来订购mydata$v1中的等级?

一个示例:例如,如果我想根据mydata$v3mydata$v2中的最高值,在mydata$v1 var1中按顺序排序var3中的最高值。 get将是:var2mydata$v2mydata$v3,因为mydata$v2中的值为83.1,19.6,1.11。

另一个例子:例如,如果我想在mydata$v1中根据var1var2中的值总和来排序{强烈同意'和'同意'var3我得到的顺序是:mydata$v2 v1 v2 v3 1 Strongly agree 83.1 var1 2 Agree 14.9 var1 3 Disagree 1.5 var1 4 Strongly disagree 0.6 var1 5 Strongly agree 11.8 var2 6 Agree 36.5 var2 7 Disagree 17.7 var2 8 Strongly disagree 43.8 var2 9 Strongly agree 19.6 var3 10 Agree 12.0 var3 11 Disagree 31.6 var3 12 Strongly disagree 36.8 var3 levels(mydata$v3) [1] "var1" "var2" "var3" v1 v2 v3 1 Strongly agree 83.1 var1 2 Agree 14.9 var1 3 Disagree 1.5 var1 4 Strongly disagree 0.6 var1 5 Strongly agree 11.8 var2 6 Agree 36.5 var2 7 Disagree 17.7 var2 8 Strongly disagree 43.8 var2 9 Strongly agree 19.6 var3 10 Agree 12.0 var3 11 Disagree 31.6 var3 12 Strongly disagree 36.8 var3 levels(mydata$v3) [1] "var1" "var3" "var2" 因为 v1 v2 v3 1 Strongly agree 83.1 var1 2 Agree 14.9 var1 3 Disagree 1.5 var1 4 Strongly disagree 0.6 var1 5 Strongly agree 11.8 var2 6 Agree 36.5 var2 7 Disagree 17.7 var2 8 Strongly disagree 43.8 var2 9 Strongly agree 19.6 var3 10 Agree 12.0 var3 11 Disagree 31.6 var3 12 Strongly disagree 36.8 var3 levels(mydata$v3) [1] "var1" "var2" "var3" 中的值是(83.1 + 14.9)= 98,(11.8 + 36.5)= 48.3,(19.6 + 12)= 31.6

我不知道如何自己解决这个问题。而且,我处理了很多像这样的帧,所以代码必须进入函数

修改

在这两个例子中,我想要的结果是原始的data.frame,只有mydata $ v3级别的顺序发生了变化。

所以在示例1中我有:

                  v1   v2   v3
1     Strongly agree 83.1 var1
2              Agree 14.9 var1
3           Disagree  1.5 var1
4  Strongly disagree  0.6 var1
5     Strongly agree 11.8 var2
6              Agree 36.5 var2
7           Disagree 17.7 var2
8  Strongly disagree 43.8 var2
9     Strongly agree 19.6 var3
10             Agree 12.0 var3
11          Disagree 31.6 var3
12 Strongly disagree 36.8 var3 

levels(mydata$v3)
[1] "var1" "var2" "var3"

但我想要结束的是这个。

factor(maydata$v3, levels(mydata$v3)[EXAMPLE1: order after value in v2 within 1 level in v1 /EXAMPLE2: order after sum of value within 2 levels in v1])

在示例二中,我有:

{{1}}

但希望:

{{1}}

请注意,在示例二中我拥有的和我想要的是相同的,但我有很多data.frames,其中不会是这种情况。

我想要的是

的复杂版本
{{1}}

1 个答案:

答案 0 :(得分:0)

这是aggregate的解决方案:

f <- function(mydata, v1.val) {
  # Value or sum of v2 within the selected rows
  sums <- aggregate(v2 ~ v3, data=mydata[mydata$v1 %in% v1.val,], FUN=sum)

  # Decreasing order of the sum of v2 values, or the only v2 value, for each level of v3
  ord <- order(sums$v2, decreasing=TRUE)

  # Build a new factor with the proper levels and assign it to v3
  fac <- factor(mydata$v3, levels=sums$v3[ord])

  mydata$v3 <- fac
  return(mydata)
}

数据框如上所示,但因子水平符合要求:

> f(mydata, 'Strongly agree')$v3
 [1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var3 var2

> f(mydata, c('Strongly agree', 'Agree'))$v3
 [1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var2 var3