我有一个看起来像这样的数据框,我正在准备ggplot:
txt <- "v1 v2 v3
'Strongly agree' 83.1 var1
'Agree' 14.9 var1
'Disagree' 1.5 var1
'Strongly disagree' 0.6 var1
'Strongly agree' 11.8 var2
'Agree' 36.5 var2
'Disagree' 17.7 var2
'Strongly disagree' 43.8 var2
'Strongly agree' 19.6 var3
'Agree' 12 var3
'Disagree' 31.6 var3
'Strongly disagree' 36.8 var3"
mydata <- read.table(textConnection(txt), sep = " ", header = TRUE)
我的问题是:如何根据mydata$v3
中的值和mydta$v2
中的值来订购mydata$v1
中的等级?
一个示例:例如,如果我想根据mydata$v3
中mydata$v2
中的最高值,在mydata$v1
var1
中按顺序排序var3
中的最高值。 get将是:var2
,mydata$v2
,mydata$v3
,因为mydata$v2
中的值为83.1,19.6,1.11。
另一个例子:例如,如果我想在mydata$v1
中根据var1
中var2
中的值总和来排序{强烈同意'和'同意'var3
我得到的顺序是:mydata$v2
, v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var2" "var3"
, v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var3" "var2"
因为 v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var2" "var3"
中的值是(83.1 + 14.9)= 98,(11.8 + 36.5)= 48.3,(19.6 + 12)= 31.6
我不知道如何自己解决这个问题。而且,我处理了很多像这样的帧,所以代码必须进入函数
修改
在这两个例子中,我想要的结果是原始的data.frame,只有mydata $ v3级别的顺序发生了变化。
所以在示例1中我有:
v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var2" "var3"
但我想要结束的是这个。
factor(maydata$v3, levels(mydata$v3)[EXAMPLE1: order after value in v2 within 1 level in v1 /EXAMPLE2: order after sum of value within 2 levels in v1])
在示例二中,我有:
{{1}}
但希望:
{{1}}
请注意,在示例二中我拥有的和我想要的是相同的,但我有很多data.frames,其中不会是这种情况。
我想要的是
的复杂版本{{1}}
答案 0 :(得分:0)
这是aggregate
的解决方案:
f <- function(mydata, v1.val) {
# Value or sum of v2 within the selected rows
sums <- aggregate(v2 ~ v3, data=mydata[mydata$v1 %in% v1.val,], FUN=sum)
# Decreasing order of the sum of v2 values, or the only v2 value, for each level of v3
ord <- order(sums$v2, decreasing=TRUE)
# Build a new factor with the proper levels and assign it to v3
fac <- factor(mydata$v3, levels=sums$v3[ord])
mydata$v3 <- fac
return(mydata)
}
数据框如上所示,但因子水平符合要求:
> f(mydata, 'Strongly agree')$v3
[1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var3 var2
> f(mydata, c('Strongly agree', 'Agree'))$v3
[1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var2 var3