我有一个x和y坐标的数据框,还有一个分类(A或B)在重复分类类型的所有连续行上应用操作的最佳方法是什么?
以下是一个例子:
set.seed(1)
n = 9
x = 1:n
y = runif(n)
df = data.frame(x,y,type=sample(c("A","B"),n,replace=TRUE))
产生以下内容:
+---+---+-----------+------+ | | x | y | type | +---+---+-----------+------+ | 1 | 1 | 0.2655087 | A | | 2 | 2 | 0.3721239 | A | | 3 | 3 | 0.5728534 | A | | 4 | 4 | 0.9082078 | B | | 5 | 5 | 0.2016819 | A | | 6 | 6 | 0.8983897 | B | | 7 | 7 | 0.9446753 | A | | 8 | 8 | 0.6607978 | B | | 9 | 9 | 0.6291140 | B | +---+---+-----------+------+
所以我想进行ddply(...)
类型操作,以获得'类型'时的平均x和y坐标。在连续的行中重复分类,在上面,行1:3应该折叠为1行,行4:7不受影响,行8:9也会折叠为1行,结果应该返回6行。
答案 0 :(得分:1)
我可以考虑使用Base dplyr
和data.table
## put into numerical groups
df$grp <- match(df$type, LETTERS)
## use rle to find consecutive groups
nGroups <- length(rle(df$grp)[[1]]) ## returns number of groups
grp <- rep(seq(1,nGroups,1), rle(df$grp)$length)
## put rle groups onto data
df$rle_grp <- grp
## perform calculation
基础R
aggregate(x=df[,c("x","y")], by=list(df$rle_grp), FUN=mean)
# Group.1 x y
#1 1 2.0 0.4034953
#2 2 4.0 0.9082078
#3 3 5.0 0.2016819
#4 4 6.0 0.8983897
#5 5 7.0 0.9446753
#6 6 8.5 0.6449559
<强> dplyr 强>
## using dplyr (you asked for ddply, but I don't use plyr anymore)
library(dplyr)
df %>%
group_by(rle_grp) %>%
summarise(avgX = mean(x),
avgY = mean(y)) %>%
ungroup
# rle_grp avgX avgY
# (dbl) (dbl) (dbl)
#1 1 2.0 0.4034953
#2 2 4.0 0.9082078
#3 3 5.0 0.2016819
#4 4 6.0 0.8983897
#5 5 7.0 0.9446753
#6 6 8.5 0.6449559
<强> data.table 强>
## or using data.table which is my package of choice
library(data.table)
setDT(df)
df[, .(avgX = mean(x), avgY = mean(y)) , by=.(rle_grp)]
# rle_grp avgX avgY
#1: 1 2.0 0.4034953
#2: 2 4.0 0.9082078
#3: 3 5.0 0.2016819
#4: 4 6.0 0.8983897
#5: 5 7.0 0.9446753
#6: 6 8.5 0.6449559
答案 1 :(得分:1)
仅用基础R实现它:
changed <- which(c(TRUE, diff(as.integer(df$type)) != 0))
class <- rep(changed, diff(c(changed, nrow(df) + 1)))
df1 <- data.frame(meanX=tapply(df$x, class, mean),
meanY=tapply(df$y, class, mean))
df1
meanX meanY
1 2.0 0.4034953
4 4.0 0.9082078
5 5.0 0.2016819
6 6.0 0.8983897
7 7.0 0.9446753
8 8.5 0.6449559