我的df
包含成对的样本,可以在"IDs"
中识别该对样本。我想删除SampleTime
中不包含1的对。在我的示例中,样本1049只有2和4作为SampleTime,因此应将两者都删除。
Expr SampleTime IDs
MMRF_1030_3_BM 33.515 3 1030
MMRF_1030_1_BM 5.37626 1 1030
MMRF_1049_4_BM 13.3217 4 1049
MMRF_1049_2_BM 82.4998 2 1049
MMRF_1079_2_BM 131.134 2 1079
MMRF_1079_1_BM 6.62901 1 1079
答案 0 :(得分:2)
一个选择是ave
,用于创建用于子集行的逻辑索引。按“ ID”分组,在“ SampleTime”中检查any
的值为1并对行进行子集
df1[with(df1, ave(SampleTime==1, IDs, FUN = any)),]
# Expr SampleTime IDs
#MMRF_1030_3_BM 33.51500 3 1030
#MMRF_1030_1_BM 5.37626 1 1030
#MMRF_1079_2_BM 131.13400 2 1079
#MMRF_1079_1_BM 6.62901 1 1079
或者使用dplyr
,可以应用相同的逻辑
library(dplyr)
df1 %>%
group_by(IDs) %>%
filter(any(SampleTime == 1))
或者另一个选择是
df1 %>%
group_by(IDs) %>%
filter(1 %in% SampleTime)
或使用data.table
library(data.table)
setDT(df1)[, .SD[any(SampleTime == 1)], IDs]
df1 <- structure(list(Expr = c(33.515, 5.37626, 13.3217, 82.4998, 131.134,
6.62901), SampleTime = c(3L, 1L, 4L, 2L, 2L, 1L), IDs = c(1030L,
1030L, 1049L, 1049L, 1079L, 1079L)), class = "data.frame",
row.names = c("MMRF_1030_3_BM",
"MMRF_1030_1_BM", "MMRF_1049_4_BM", "MMRF_1049_2_BM", "MMRF_1079_2_BM",
"MMRF_1079_1_BM"))
答案 1 :(得分:0)
我们可以用subset
Ids
SampleTime = 1
并从整个数据集中过滤掉它们
subset(df, IDs %in% unique(IDs[SampleTime == 1]))
# Expr SampleTime IDs
#MMRF_1030_3_BM 33.515 3 1030
#MMRF_1030_1_BM 5.376 1 1030
#MMRF_1079_2_BM 131.134 2 1079
#MMRF_1079_1_BM 6.629 1 1079
可以在dplyr
中写为
library(dplyr)
df %>% filter(IDs %in% unique(IDs[SampleTime == 1]))
或在data.table
中以
library(data.table)
setDT(df)[IDs %in% unique(IDs[SampleTime == 1])]
数据
df <- structure(list(Expr = c(33.515, 5.37626, 13.3217, 82.4998, 131.134,
6.62901), SampleTime = c(3L, 1L, 4L, 2L, 2L, 1L), IDs = c(1030L,
1030L, 1049L, 1049L, 1079L, 1079L)), class = "data.frame", row.names =
c("MMRF_1030_3_BM", "MMRF_1030_1_BM", "MMRF_1049_4_BM", "MMRF_1049_2_BM",
"MMRF_1079_2_BM", "MMRF_1079_1_BM"))