Question

我有以下问题对我很有挑战性，因为我或多或少是R的初学者。

我有一个类似的data.frame，所以：

   a  b      c
1  x g1  date1
2  x g1  date2
3  y g2  date3
4  y g3  date4
5  y g4  date5
6  z g1  date6
7  z g2  date7
8  x g4  date8
9  y g1  date9
10 y g3 date10

我想要做的是将第a列中的第一个值与第二个值进行比较。如果它们相同，请在b列中检查g2是否跟g1。

数据按日期排序，我基本上想要找到g2跟g1后出现的次数，而a列中的对应值相似。

在上面的示例数据中，总和将为1.（第6行和第7行）

Answer 1

可能有一种更简单的方法，但这是我的data.table尝试

library(data.table) ## v 1.9.6+
setDT(df)[a == shift(a, type = "lead") & b == "g1" & shift(b, type = "lead") == "g2", .N]
## [1] 1

这基本上会将a与已移位的a列进行比较，同时检查b列是否等于g1且已移位的b列等于{ {1}}。您需要在CRAN上使用最新的g2版本才能实现此目的。

使用data.table可以在这些行中添加某些内容

dplyr

或者用基础R

library(dplyr)
df %>%
  filter(a == lead(a) & b == "g1" & lead(b) == "g2") %>%
  count()
# Source: local data table [1 x 1]
# 
#       n
#   (int)
# 1     1

Answer 2

替代方案：

数据：

df <- read.table(header=T, text=' a  b      c
1  x g1  date1
2  x g1  date2
3  y g2  date3
4  y g3  date4
5  y g4  date5
6  z g1  date6
7  z g2  date7
8  x g4  date8
9  y g1  date9
10 y g3 date10', stringsAsFactors=F)

解决方案：

library(dplyr) #for lag
#df$a == lag(df$a) checks the equality in consecutive rows in a
#the rest of the code checks the order of g2 and g1 in consecutive rows
df$out <- df$a == lag(df$a) &   grepl(paste('g2','g1'), paste(df$b, lag(df$b)))

输出：

> df
   a  b      c   out
1  x g1  date1 FALSE
2  x g1  date2 FALSE
3  y g2  date3 FALSE
4  y g3  date4 FALSE
5  y g4  date5 FALSE
6  z g1  date6 FALSE
7  z g2  date7  TRUE
8  x g4  date8 FALSE
9  y g1  date9 FALSE
10 y g3 date10 FALSE

并且

sum(df$out)
[1] 1

Answer 3

你可以这样做。

result <- NULL
for (i in 1:NROW(df)){result <- c(result, df$a[i]==df$a[i-1] & df$b[i]=="g2" & df$b[i-1]=="g1")}
length(which(result))
# [1] 1

这是数据。

a <- c("x", "x", "y", "y", "y", "z", "z", "x", "y", "y")
b <- c("g1", "g1", "g2", "g3", "g4", "g1", "g2", "g4", "g1", "g3")
c <- paste("date", 1:10, sep = "")
df <- as.data.frame(cbind(a,b,c))

R比较行

3 个答案: