Question

我目前有一个类似于多个ID的数据（范围直到1600左右）

id  year    name    status
1   1980    James   3
1   1981    James   3
1   1982    James   3
1   1983    James   4
1   1984    James   4
1   1985    James   1
1   1986    James   1
1   1987    James   1
2   1982    John    2
2   1983    John    2
2   1984    John    1
2   1985    John    1

我希望对这些数据进行子集化，以便它只有status = 1的信息和之前的状态。我也想消除多个1，只保存前1个。总之，我想要：

id  year    name    status
1   1984    James   4
1   1985    James   1
2   1983    John    2
2   1984    John    1

我这样做是因为我正在弄清楚在哪一年有多少人从某个状态变为状态1.我只知道子命令，我不认为我可以从中获取这些数据做subset(data, subset=(status==1))。我怎么能在那之前保存信息

我想再次添加这个问题 - 当我对这个问题（使用plr包）的第一个回复和使用重复命令的第三个回复应用时，我没有得到相同的结果。我发现第一个回复准确地保留了信息，而第三个回复没有。

Answer 1

这就是你想要的。

library(plyr)

ddply(d, .(name), function(x) {
  i <- match(1, x$status)
  if (is.na(i))
    NULL
  else
    x[c(i-1, i), ]
})

  id year  name status
1  1 1984 James      4
2  1 1985 James      1
3  2 1983  John      2
4  2 1984  John      1

Answer 2

这是一个解决方案 - 对于每个数字分组（cumsum位），它会查看第一个数字，如果状态为1，则查看第一个和前一行：

library(data.table)
dt = data.table(your_df)

dt[dt[, if(status[1] == 1) c(.I[1]-1, .I[1]),
        by = cumsum(c(0,diff(status)!=0))]$V1]
#   id year  name status
#1:  1 1984 James      4
#2:  1 1985 James      1
#3:  2 1983  John      2
#4:  2 1984  John      1

Answer 3

使用base R，这是一种方法：

# this first line is how I imported your data after highlighting and copying (i.e. ctrl+c)
d<-read.table("clipboard",header=T)

# find entries where the subsequent row's "status" is equal to 1
# really what's going on is finding rows where "status" = 1, then subtracting 1  
# to find the index of the previous row
e<-d[which(d$status==1)-1 ,]
# be careful if your first "status" entry = 1...

# What you want
# Here R will look for entries where "name" and "status" are both repeats of a 
# previous row and where "status" = 1, and it will get rid of those entries
e[!(duplicated(e[,c("name","status")]) & e$status==1),]

   id year  name status
 5  1 1984 James      4
 6  1 1985 James      1
10  2 1983  John      2
11  2 1984  John      1

Answer 4

我自己喜欢data.table解决方案，但实际上有一种方法可以使用subset。

# import data from clipboard
x = read.table(pipe("pbpaste"),header=TRUE)

# Get the result table that you want
x1 = subset(x, status==1 | 
               c(status[-1],0)==1 )
result = subset(x1, !duplicated(cbind(name,status)) )

在R中选择具有特定条件的行

4 个答案: