在同一查询中同时使用LIKE和NOT LIKE

时间:2015-09-21 21:13:44

标签: r data.table dplyr sqldf

我有df这样的

Category <- c('D_L','D_R','FA1','LBP0W','L-010','L-020','LW_-010','LWA_PT_035','LWA_PT_055','RBP0W','RET_MAG','R-010','R-000','RWA_PT_035','RWA_PT_055','TPH')
ID <- c(111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126)
df <- data.frame(ID,Category)
df

    ID   Category
1  111        D_L
2  112        D_R
3  113        FA1
4  114      LBP0W
5  115      L-010
6  116      L-020
7  117    LW_-010
8  118 LWA_PT_035
9  119 LWA_PT_055
10 120      RBP0W
11 121    RET_MAG
12 122      R-010
13 123      R-000
14 124 RWA_PT_035
15 125 RWA_PT_055
16 126        TPH

我使用sqldf将我的数据集过滤为两类。

df_R <- sqldf("SELECT * FROM df 
                  WHERE Category NOT LIKE '%_L'
                  AND Category NOT LIKE 'LW_%'
                  AND category NOT LIKE 'L-%'
                  AND category NOT LIKE 'LB%'")

df_L <- sqldf("SELECT * FROM df 
              WHERE Category NOT LIKE '%_R'
              AND Category NOT LIKE 'RW_%'
              AND category NOT LIKE 'R-%'
              AND category NOT LIKE 'RB%'")

这里我得到2个数据帧。 挑战是:

1)对于df_R - 我需要返回“RWA_PT_035”&amp;不是“RWA_PT_055”类别 2)对于df_L - 我需要返回“LWA_PT_035”&amp;不是“LWA_PT_055”类别

因此当我尝试这样做时

df_R <- sqldf("SELECT * FROM df 
                      WHERE Category NOT LIKE '%_L'
                      AND Category NOT LIKE 'LW_%'
                      AND Category NOT LIKE 'L-%'
                      AND Category NOT LIKE 'LB%'
                      AND Category LIKE 'RWA_PT_035'")

它只返回1个观察点,对于df_R,它是“RWA_PT_035”,但是我想要的输出是

   ID   Category
1 112        D_R
2 113        FA1
3 120      RBP0W
4 121    RET_MAG
5 122      R-010
6 123      R-000
7 124 RWA_PT_035
8 126        TPH

和df_L

    ID   Category
1  111        D_L
2  113        FA1
3  114      LBP0W
4  115      L-010
5  116      L-020
6  117    LW_-010
7  118 LWA_PT_035
8  121    RET_MAG
9 126        TPH

我想知道我是否可以像上面一样同时在查询中使用“LIKE”和“NOT LIKE”?或者,如果还有其他方法可以做到这一点?

我也对data.table或dplyr等其他方法开放,而不是sqldf。

2 个答案:

答案 0 :(得分:2)

我得到了David Arenburg的解决方案

df[!grepl("_R|RWA_|R-|RB|_PT_055", df$Category),]

答案 1 :(得分:1)

使用sqldf复制@DavidArenburg解决方案:

#Using @DavidArenburg solution:
res1 <- df[!grepl("_R|RWA_|R-|RB|_PT_055", df$Category),]

#Using sqldf
library(sqldf)
res2 <- sqldf("SELECT * FROM df 
               WHERE Category NOT LIKE '%_R' AND
                     Category NOT LIKE 'RWA_%' AND
                     Category NOT LIKE 'R-%' AND
                     Category NOT LIKE 'RB%' AND
                     Category NOT LIKE '%_PT_055'")

# res1 == res2
#      ID Category
# 1  TRUE     TRUE
# 3  TRUE     TRUE
# 4  TRUE     TRUE
# 5  TRUE     TRUE
# 6  TRUE     TRUE
# 7  TRUE     TRUE
# 8  TRUE     TRUE
# 11 TRUE     TRUE
# 16 TRUE     TRUE