如何将数据帧的行放在一起(选择行自下而上)

时间:2017-03-29 11:48:50

标签: r dataframe

我用一个例子来解释我的问题。例如,如果我在第5行(下表)中,如何获得具有相同P_P值的5之前的行。关键是所选行的行索引应该是顺序的。 例如,在下表的情况下,我只需要获得第3行和第4行(因为在第1行和其余行之间存在第2行,其中P_P不同。) 仅供参考,我可以使用for loop来做,但我想避免它。

由于

ID   Contest   P_P   Time
1      UMA      A    2015
2      DOIS     B    2016
3      DOIS     A    2016
4      UMA      A    2017
5      DOIS     A    2017

3 个答案:

答案 0 :(得分:3)

您可以在base R:

中执行此操作
rw <- 5
df[(max(which(!(df[1:(rw-1),]$P_P==df[rw,]$P_P)))+1):(rw-1),]

# ID Contest P_P Time
#3  3    DOIS   A 2016
#4  4     UMA   A 2017

我们的想法是首先找到1到rw-1之间的匹配(即df[1:(rw-1),]$P_P==df[rw,]$P_P),然后找到{FALSE捕获的最后一个不匹配(即max(which(!...))) 1}}。

df <- structure(list(ID = 1:5, Contest = structure(c(2L, 1L, 1L, 2L, 
1L), .Label = c("DOIS", "UMA"), class = "factor"), P_P = structure(c(1L, 
2L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"), Time = c(2015L, 
2016L, 2016L, 2017L, 2017L)), .Names = c("ID", "Contest", "P_P", 
"Time"), class = "data.frame", row.names = c(NA, -5L))

答案 1 :(得分:2)

row <- 5

## get the subset with P_P = p-p of row 
subset <- subset(df[(row-1):1,], P_P == df[row,]$P_P)

## check the difference 
a <- which(abs(diff(subset$ID)) != 1)


subset[1:a[1],]

# ID Contest P_P Time
# 4  4     UMA   A 2017
# 3  3    DOIS   A 2016

答案 2 :(得分:0)

以下是rev()rle()的解决方案:

tail(d, rle(rev(as.integer(d$P_P)))$lengths[1]) # with last row
head(tail(d, rle(rev(as.integer(d$P_P)))$lengths[1]), -1) # without last row

另一种解决方案:
我们可以使用inverse.rle()来构建分组变量:

r <- rle(as.character(d$P_P)) # also possible: r <- rle(as.integer(d$P_P))
r$values <- seq(r$values)
d$group <- inverse.rle(r)
i <- 5
d[d$group==d$group[i],]    

结果:

#  ID Contest P_P Time group
#3  3    DOIS   A 2016     3
#4  4     UMA   A 2017     3
#5  5    DOIS   A 2017     3

如果您想要没有行i的结果:

subset(d[-i,], group==d$group[i])

数据:

d <- structure(list(ID = 1:5, Contest = structure(c(2L, 1L, 1L, 2L, 
1L), .Label = c("DOIS", "UMA"), class = "factor"), P_P = structure(c(1L, 
2L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"), Time = c(2015L, 
2016L, 2016L, 2017L, 2017L)), .Names = c("ID", "Contest", "P_P", 
"Time"), class = "data.frame", row.names = c(NA, -5L))