我在R中有一个数据表,其中包含3个功能
DT_A <- data.table(sid=c(1,1,2,2,2,3,3,2,3,3), date=as.Date(c("2014-06-22","2014-06-23","2014-06-22","2014-06-23", "2014-06-24","2014-06-22",
"2014-06-23","2014-06-24","2014-06-25","2014-06-26")),
Status1 = c("A","B","A","A","B","A","A","A","B","B"))
数据看起来像这样
sid date Status1
1: 1 2014-06-22 A
2: 1 2014-06-23 B
3: 2 2014-06-22 A
4: 2 2014-06-23 A
5: 2 2014-06-24 B
6: 3 2014-06-22 A
7: 3 2014-06-23 A
8: 2 2014-06-24 A
9: 3 2014-06-25 B
10: 3 2014-06-26 B
如何查看状态1并查看行中是否有3行具有值A(如第6,7,8行),那么我们会删除这些行吗?
答案 0 :(得分:2)
问题标记为data.table
,因此我会尝试给出正确答案:
DT_A[!DT_A[, .I[.N == 3 & Status1 == "A"], by = rleid(Status1)]$V1]
sid date Status1 1: 1 2014-06-22 A 2: 1 2014-06-23 B 3: 2 2014-06-22 A 4: 2 2014-06-23 A 5: 2 2014-06-24 B 6: 3 2014-06-25 B 7: 3 2014-06-26 B
正如Frank所指出的,我的第一个答案(现在已经编辑过)只是针对OP提供的给定样本数据集工作但是其他测试用例都失败了。
因此,编辑后的代码将应用于其他一些测试用例。
案例B :连续3行字母A
和B
DT_B <- data.table(
sid=c(1,1,2,2,2,3,3,2,3,3,3),
date=as.Date(c("2014-06-22","2014-06-23","2014-06-22","2014-06-23", "2014-06-24","2014-06-22",
"2014-06-23","2014-06-24","2014-06-25","2014-06-26","2014-06-26")),
Status1 = c("A","B","A","A","B","A","A","A","B","B","B"))
DT_B
sid date Status1 1: 1 2014-06-22 A 2: 1 2014-06-23 B 3: 2 2014-06-22 A 4: 2 2014-06-23 A 5: 2 2014-06-24 B 6: 3 2014-06-22 A 7: 3 2014-06-23 A 8: 2 2014-06-24 A 9: 3 2014-06-25 B 10: 3 2014-06-26 B 11: 3 2014-06-26 B
DT_B[!DT_B[, .I[.N == 3 & Status1 == "A"], by = rleid(Status1)]$V1]
sid date Status1 1: 1 2014-06-22 A 2: 1 2014-06-23 B 3: 2 2014-06-22 A 4: 2 2014-06-23 A 5: 2 2014-06-24 B 6: 3 2014-06-25 B 7: 3 2014-06-26 B 8: 3 2014-06-26 B
仅删除包含字母A
(第6行到第8行)的3个连续行。
案例C:无需删除
DT_C <- data.table(
sid=c(1,1,2,2,2,3,3,2,3,3,3),
date=as.Date(c("2014-06-22","2014-06-23","2014-06-22","2014-06-23", "2014-06-24","2014-06-22",
"2014-06-23","2014-06-24","2014-06-25","2014-06-26","2014-06-26")),
Status1 = c("A","B","A","A","B","A","A","C","B","B","C"))
DT_C
sid date Status1 1: 1 2014-06-22 A 2: 1 2014-06-23 B 3: 2 2014-06-22 A 4: 2 2014-06-23 A 5: 2 2014-06-24 B 6: 3 2014-06-22 A 7: 3 2014-06-23 A 8: 2 2014-06-24 C 9: 3 2014-06-25 B 10: 3 2014-06-26 B 11: 3 2014-06-26 C
DT_C[!DT_C[, .I[.N == 3 & Status1 == "A"], by = rleid(Status1)]$V1]
sid date Status1 1: 1 2014-06-22 A 2: 1 2014-06-23 B 3: 2 2014-06-22 A 4: 2 2014-06-23 A 5: 2 2014-06-24 B 6: 3 2014-06-22 A 7: 3 2014-06-23 A 8: 2 2014-06-24 C 9: 3 2014-06-25 B 10: 3 2014-06-26 B 11: 3 2014-06-26 C
没有行被删除,因为没有包含A
的3个连续行。
案例D:边缘案例:删除所有行
DT_D <- DT_A[6:8]
DT_D
sid date Status1 1: 3 2014-06-22 A 2: 3 2014-06-23 A 3: 2 2014-06-24 A
DT_D[!DT_D[, .I[.N == 3 & Status1 == "A"], by = rleid(Status1)]$V1]
Empty data.table (0 rows) of 3 cols: sid,date,Status1
删除所有行并返回空的data.table,因为输入data.table只包含3行,字母为A
。
答案 1 :(得分:1)
with(rle(DT_A$Status1 == "A"), {
unlist(lapply(which(lengths >= 3), function(i)
(1+cumsum(lengths)[i-1]):cumsum(lengths)[i]))
})
#[1] 6 7 8
答案 2 :(得分:1)
我假设你在你的sid定义中犯了一个错误,并且你的3行所有sid = 3.如果没有,抱歉我的回答不起作用。如果是这种情况,解决方案可以是一行:
DT_A[,.SD[.N < 3 | Status1 != "A",], by = .(sid,Status1)]
是一个简单的行,可以满足你的需要:它选择行数小于3或不同于列Status1中的B的数据(这是你想要删除的选择的否定:至少3 A)按sid和Status1分组时。 希望它有所帮助