提取组中位数n R

时间:2013-09-17 06:40:11

标签: r sorting dataframe subset

如果我有如下数据框:

v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5) 
v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2) 
lvl <- c("a","a","a","b","b","b","c","c","c") 
d <- data.frame(v1,v2,lvl) 

> d
   v1  v2 lvl
1 2.2 4.5   a
2 3.2 2.5   a
3 1.2 3.5   a
4 4.2 5.5   b
5 2.2 7.5   b
6 3.2 6.5   b
7 2.2 2.5   c
8 1.2 1.5   c
9 5.2 3.5   c

d$lvl的每个级别中,我想提取值为d$v1为中位数的行(对于最简单的情况,每个级别的d$lvl都有三行)。所以我想得到:

   v1  v2 l 
1 2.2 4.5 a 
6 3.2 6.5 b 
7 2.2 2.5 c 

4 个答案:

答案 0 :(得分:1)

有几种方法可以做到这一点:

查看plyr包,这对于操作数据子集非常有用:

library(plyr)
ddply(d, .(lvl), summarize, v1 = median(v1), v2 = median(v2))

如果您对SQL次查询感到满意,可以使用sqldf包:

library(sqldf)
sqldf("SELECT median(v1) as v1, median(v2) as v2, lvl FROM d GROUP BY lvl")

答案 1 :(得分:1)

对于具有奇数行的组,这可行。您需要考虑如何处理具有偶数行的组。例如,您可能希望在一个方向或另一个方向上舍入中位数,请参阅?round

library(plyr)
d2 <- ddply(.data = d, .variables = .(lvl), function(x)
  x[which(x$v1 == median(x$v1)), ])

#    v1  v2 lvl
# 1 2.2 4.5   a
# 2 3.2 6.5   b
# 3 2.2 2.5   c

答案 2 :(得分:0)

首先,使用函数ddply计算v1乘以lvl的中位数(舍入为1十进制)

(install.packages("plyr")
 df <- ddply(d, .(lvl), summarize, v1 = round(median(v1),1))

其次,将原始df(d)与计算出的一个(df)合并,合并比较原始数据(d)中lvl和v1是否相同,只取这些行

 df1 <- merge(df, d, by = c("lvl","v1"))

View(df1)
  lvl  v1  v2
1   a 2.2 4.5
2   b 3.2 6.5
3   c 2.2 2.5

答案 3 :(得分:0)

我想提出一种处理奇数行和偶数行的方法:

## example data
v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5, 1, 1, 1, 1) 
v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2, 1.5, 2.5, 3.5, 4.5) 
lvl <- c("a","a","a","b","b","b","c","c","c", "d", "d", "d", "d")
d <- data.frame(v1,v2,lvl)

## define own median index function
medIdx <- function(x) {
  n <- length(x)
  ## even: p == n/2
  ## odd:  p == (n+1)/2
  p <- ceiling(n/2)
  return(which(x == sort(x, partial=p)[p])[1])
}

## run blockwise (blocks defined by d$lvl) and bind results
do.call(rbind, by(d, INDICES=d$lvl, FUN=function(x){ return(x[medIdx(x$v1), ]) }))

#   v1  v2 lvl
#a 2.2 4.5   a
#b 3.2 6.5   b
#c 2.2 2.5   c
#d 2.5 1.0   d