子集具有相应ID名称的3个阈值条件的数据帧

时间:2017-04-11 16:35:28

标签: r dataframe dplyr

我有一个像这样的数据框

ID <- c ("ABC_10","AZM_11","ABC_11","ABC_12",
         "ABC_13","AZM_12","ABC_14","ABC_15",
         "CZX_10","CZX_11","CZX_12","CZX_13",
         "FIN_10","FIN_11","FIN_12","FIN_13",
         "FNM_10","FNM_11","FXS_10","FXS_11")  
Id.n <- c(345,380,339,361,
          245,390,639,661,
          545,580,539,261,
          345,180,139,261,
          1045,1580,39,161)
df <- data.frame(ID,Id.n)

我正在尝试使用以下条件对此数据框进行子集化

Threshold of ID.n's for FXS - 100
Threshold of ID.n's for FIN - 200
Threshold of ID.n's for all Other ID's - 300

所需的输出

       ID Id.n
   ABC_10  345
   AZM_11  380
   ABC_11  339
   ABC_12  361
   AZM_12  390
   ABC_14  639
   ABC_15  661
   CZX_10  545
   CZX_11  580
   CZX_12  539
   FIN_10  345
   FIN_13  261
   FNM_10 1045
   FNM_11 1580
   FXS_11  161

我试图这样做,但只是没有做对。

df <- subset(df,ifelse(grepl("FXS",df$ID), df$ID.n > 100,))

有人能指出我正确的方向吗?

3 个答案:

答案 0 :(得分:4)

使用dplyr

library(dplyr)

df2 <- df %>%
  filter((grepl("FXS", ID) & Id.n > 100) | 
           (grepl("FIN", ID) & Id.n > 200) |
           (!grepl("FXS|FIN", ID) & Id.n > 300))

df2
 #     ID Id.n
 # ABC_10  345
 # AZM_11  380
 # ABC_11  339
 # ABC_12  361
 # AZM_12  390
 # ABC_14  639
 # ABC_15  661
 # CZX_10  545
 # CZX_11  580
 # CZX_12  539
 # FIN_10  345
 # FIN_13  261
 # FNM_10 1045
 # FNM_11 1580
 # FXS_11  161

答案 1 :(得分:4)

使用经过清理的数据更简单。使用data.table,看起来像......

library(data.table)
setDT(df)
df[, c("x", "y") := tstrsplit(ID, "_")][, ID := NULL ]

xDT = data.table(x = unique(df$x))
xDT[, th := 300 ]    
xDT[.(x = c("FXS", "FIN"), th = c(100, 200)), on=.(x), th := i.th ]   

然后非equi联接用于过滤:

df[xDT, on=.(x, Id.n > th)]

    Id.n   x  y
 1:  300 ABC 11
 2:  300 ABC 10
 3:  300 ABC 12
 4:  300 ABC 14
 5:  300 ABC 15
 6:  300 AZM 11
 7:  300 AZM 12
 8:  300 CZX 12
 9:  300 CZX 10
10:  300 CZX 11
11:  200 FIN 13
12:  200 FIN 10
13:  300 FNM 10
14:  300 FNM 11
15:  100 FXS 11

关于这里的grepl,我认为它是a bad idea

答案 2 :(得分:1)

df[(grepl("FXS",df$ID) & df$Id.n >= 100) | 
       (grepl("FIN",df$ID) & df$Id.n >= 200) | 
       (!(grepl("FXS",df$ID) | grepl("FIN", df$ID)) & df$Id.n >= 300),]
#       ID Id.n
#1  ABC_10  345
#2  AZM_11  380
#3  ABC_11  339
#4  ABC_12  361
#6  AZM_12  390
#7  ABC_14  639
#8  ABC_15  661
#9  CZX_10  545
#10 CZX_11  580
#11 CZX_12  539
#13 FIN_10  345
#16 FIN_13  261
#17 FNM_10 1045
#18 FNM_11 1580
#20 FXS_11  161