通过在R中连接2列(包含逗号的字符串)来过滤数据

时间:2015-09-02 16:40:45

标签: r dplyr reshape2

我有一个df

ID <- c('DX154','DX154','DX155','DX155','DX156','DX157','DX158','DX159') 
Country <- c('US','US','US','US')
Level <- c('Level_1A','Level_1A','Level_1B','Level_1B','Level_1A','Level_1B','Level_1B','Level_1A')
Type_A <- c('Iphone','Iphone','Android','Android','aaa','bbb','ccc','ddd')
Type_B <- c("Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","ALL","ALL","ALL","ALL")
df <- data.frame(ID ,Country ,Level ,Type_A,Type_B)

DF

           ID Country    Level  Type_A                   Type_B
1 DX154      US Level_1A  Iphone     Iphone,Ipad,Ipod,Mac
2 DX154      US Level_1A  Iphone Gmail,Android,Drive,Maps
3 DX155      US Level_1B Android     Iphone,Ipad,Ipod,Mac
4 DX155      US Level_1B Android Gmail,Android,Drive,Maps
5 DX156      US Level_1A     aaa                      ALL
6 DX157      US Level_1B     bbb                      ALL
7 DX158      US Level_1B     ccc                      ALL
8 DX159      US Level_1A     ddd                      ALL

我试图通过加入Type_A,Type_B列但不知道如何解析逗号来提交此数据框。有人可以帮我这个。

我想要的输出是

        ID Country    Level  Type_A                   Type_B
1 DX154      US Level_1A  Iphone     Iphone,Ipad,Ipod,Mac
2 DX155      US Level_1B Android Gmail,Android,Drive,Maps
3 DX156      US Level_1A     aaa                      ALL
4 DX157      US Level_1B     bbb                      ALL
5 DX158      US Level_1B     ccc                      ALL
6 DX159      US Level_1A     ddd                      ALL

2 个答案:

答案 0 :(得分:3)

这是一个解决方案。它有点噱头,但有人会尽快为你提供超级聪明和快速的版本。这样做是顺行的,但Akrun的回答告诉你如何只通过id来做。

library(dplyr)
df <- df %>%
  mutate(row_id = 1:n()) %>%
  group_by(row_id) %>%
  filter(grepl(Type_A, Type_B) | Type_B === "ALL")

答案 1 :(得分:2)

我们按ID&#39;分组,使用grepl,通过paste&#39; Type_A&#39;来指定模式。列(在此示例中,使用Type_A[1L]也应该有效,因为&#39; Type_A&#39;元素是重复的。更好的例子就是好的)并将其用于行filter。我们还使用greplfilter&#39; Type_B&#39;从字符串的开头(,)到结尾(^)没有$

library(dplyr)
df %>% 
     group_by(ID) %>%
     filter(grepl(paste(Type_A, collapse='|'),
            Type_B)|grepl('^[^,]+$', Type_B))

#     ID Country    Level  Type_A                   Type_B
#1 DX154      US Level_1A  Iphone     Iphone,Ipad,Ipod,Mac
#2 DX155      US Level_1B Android Gmail,Android,Drive,Maps
#3 DX156      US Level_1A     aaa                      ALL
#4 DX157      US Level_1B     bbb                      ALL
#5 DX158      US Level_1B     ccc                      ALL
#6 DX159      US Level_1A     ddd                      ALL