Question

我的df包含成对的样本，可以在"IDs"中识别该对样本。我想删除SampleTime中不包含1的对。在我的示例中，样本1049只有2和4作为SampleTime，因此应将两者都删除。

                  Expr SampleTime  IDs
MMRF_1030_3_BM  33.515          3 1030
MMRF_1030_1_BM 5.37626          1 1030
MMRF_1049_4_BM 13.3217          4 1049
MMRF_1049_2_BM 82.4998          2 1049
MMRF_1079_2_BM 131.134          2 1079
MMRF_1079_1_BM 6.62901          1 1079

Answer 1

一个选择是ave，用于创建用于子集行的逻辑索引。按“ ID”分组，在“ SampleTime”中检查any的值为1并对行进行子集

df1[with(df1, ave(SampleTime==1, IDs, FUN = any)),]
#                   Expr SampleTime  IDs
#MMRF_1030_3_BM  33.51500          3 1030
#MMRF_1030_1_BM   5.37626          1 1030
#MMRF_1079_2_BM 131.13400          2 1079
#MMRF_1079_1_BM   6.62901          1 1079

或者使用dplyr，可以应用相同的逻辑

library(dplyr)
df1 %>%
    group_by(IDs) %>%
    filter(any(SampleTime == 1))

或者另一个选择是

df1 %>%
   group_by(IDs) %>%
   filter(1 %in% SampleTime)

或使用data.table

library(data.table)
setDT(df1)[, .SD[any(SampleTime == 1)], IDs]

数据

df1 <- structure(list(Expr = c(33.515, 5.37626, 13.3217, 82.4998, 131.134, 
6.62901), SampleTime = c(3L, 1L, 4L, 2L, 2L, 1L), IDs = c(1030L, 
1030L, 1049L, 1049L, 1079L, 1079L)), class = "data.frame", 
 row.names = c("MMRF_1030_3_BM", 
"MMRF_1030_1_BM", "MMRF_1049_4_BM", "MMRF_1049_2_BM", "MMRF_1079_2_BM", 
"MMRF_1079_1_BM"))

Answer 2

我们可以用subset Ids SampleTime = 1并从整个数据集中过滤掉它们

subset(df, IDs %in% unique(IDs[SampleTime == 1]))

#                  Expr SampleTime  IDs
#MMRF_1030_3_BM  33.515          3 1030
#MMRF_1030_1_BM   5.376          1 1030
#MMRF_1079_2_BM 131.134          2 1079
#MMRF_1079_1_BM   6.629          1 1079

可以在dplyr中写为

library(dplyr)
df %>% filter(IDs %in% unique(IDs[SampleTime == 1]))

或在data.table中以

library(data.table)
setDT(df)[IDs %in% unique(IDs[SampleTime == 1])]

数据

df <- structure(list(Expr = c(33.515, 5.37626, 13.3217, 82.4998, 131.134, 
6.62901), SampleTime = c(3L, 1L, 4L, 2L, 2L, 1L), IDs = c(1030L, 
1030L, 1049L, 1049L, 1079L, 1079L)), class = "data.frame", row.names = 
c("MMRF_1030_3_BM", "MMRF_1030_1_BM", "MMRF_1049_4_BM", "MMRF_1049_2_BM", 
"MMRF_1079_2_BM", "MMRF_1079_1_BM"))

删除没有特定配对的行

2 个答案:

数据