如果我有如下数据框:
v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5)
v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2)
lvl <- c("a","a","a","b","b","b","c","c","c")
d <- data.frame(v1,v2,lvl)
> d
v1 v2 lvl
1 2.2 4.5 a
2 3.2 2.5 a
3 1.2 3.5 a
4 4.2 5.5 b
5 2.2 7.5 b
6 3.2 6.5 b
7 2.2 2.5 c
8 1.2 1.5 c
9 5.2 3.5 c
在d$lvl
的每个级别中,我想提取值为d$v1
为中位数的行(对于最简单的情况,每个级别的d$lvl
都有三行)。所以我想得到:
v1 v2 l
1 2.2 4.5 a
6 3.2 6.5 b
7 2.2 2.5 c
答案 0 :(得分:1)
有几种方法可以做到这一点:
查看plyr
包,这对于操作数据子集非常有用:
library(plyr)
ddply(d, .(lvl), summarize, v1 = median(v1), v2 = median(v2))
如果您对SQL
次查询感到满意,可以使用sqldf
包:
library(sqldf)
sqldf("SELECT median(v1) as v1, median(v2) as v2, lvl FROM d GROUP BY lvl")
答案 1 :(得分:1)
对于具有奇数行的组,这可行。您需要考虑如何处理具有偶数行的组。例如,您可能希望在一个方向或另一个方向上舍入中位数,请参阅?round
。
library(plyr)
d2 <- ddply(.data = d, .variables = .(lvl), function(x)
x[which(x$v1 == median(x$v1)), ])
# v1 v2 lvl
# 1 2.2 4.5 a
# 2 3.2 6.5 b
# 3 2.2 2.5 c
答案 2 :(得分:0)
首先,使用函数ddply计算v1乘以lvl的中位数(舍入为1十进制)
(install.packages("plyr")
df <- ddply(d, .(lvl), summarize, v1 = round(median(v1),1))
其次,将原始df(d)与计算出的一个(df)合并,合并比较原始数据(d)中lvl和v1是否相同,只取这些行
df1 <- merge(df, d, by = c("lvl","v1"))
View(df1)
lvl v1 v2
1 a 2.2 4.5
2 b 3.2 6.5
3 c 2.2 2.5
答案 3 :(得分:0)
我想提出一种处理奇数行和偶数行的方法:
## example data
v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5, 1, 1, 1, 1)
v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2, 1.5, 2.5, 3.5, 4.5)
lvl <- c("a","a","a","b","b","b","c","c","c", "d", "d", "d", "d")
d <- data.frame(v1,v2,lvl)
## define own median index function
medIdx <- function(x) {
n <- length(x)
## even: p == n/2
## odd: p == (n+1)/2
p <- ceiling(n/2)
return(which(x == sort(x, partial=p)[p])[1])
}
## run blockwise (blocks defined by d$lvl) and bind results
do.call(rbind, by(d, INDICES=d$lvl, FUN=function(x){ return(x[medIdx(x$v1), ]) }))
# v1 v2 lvl
#a 2.2 4.5 a
#b 3.2 6.5 b
#c 2.2 2.5 c
#d 2.5 1.0 d