我有一个df
ID <- c('DX154','DX154','DX155','DX155','DX156','DX157','DX158','DX159')
Country <- c('US','US','US','US')
Level <- c('Level_1A','Level_1A','Level_1B','Level_1B','Level_1A','Level_1B','Level_1B','Level_1A')
Type_A <- c('Iphone','Iphone','Android','Android','aaa','bbb','ccc','ddd')
Type_B <- c("Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","ALL","ALL","ALL","ALL")
df <- data.frame(ID ,Country ,Level ,Type_A,Type_B)
DF
ID Country Level Type_A Type_B
1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
2 DX154 US Level_1A Iphone Gmail,Android,Drive,Maps
3 DX155 US Level_1B Android Iphone,Ipad,Ipod,Mac
4 DX155 US Level_1B Android Gmail,Android,Drive,Maps
5 DX156 US Level_1A aaa ALL
6 DX157 US Level_1B bbb ALL
7 DX158 US Level_1B ccc ALL
8 DX159 US Level_1A ddd ALL
我试图通过加入Type_A,Type_B列但不知道如何解析逗号来提交此数据框。有人可以帮我这个。
我想要的输出是
ID Country Level Type_A Type_B
1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
2 DX155 US Level_1B Android Gmail,Android,Drive,Maps
3 DX156 US Level_1A aaa ALL
4 DX157 US Level_1B bbb ALL
5 DX158 US Level_1B ccc ALL
6 DX159 US Level_1A ddd ALL
答案 0 :(得分:3)
这是一个解决方案。它有点噱头,但有人会尽快为你提供超级聪明和快速的版本。这样做是顺行的,但Akrun的回答告诉你如何只通过id来做。
library(dplyr)
df <- df %>%
mutate(row_id = 1:n()) %>%
group_by(row_id) %>%
filter(grepl(Type_A, Type_B) | Type_B === "ALL")
答案 1 :(得分:2)
我们按ID&#39;分组,使用grepl
,通过paste
&#39; Type_A&#39;来指定模式。列(在此示例中,使用Type_A[1L]
也应该有效,因为&#39; Type_A&#39;元素是重复的。更好的例子就是好的)并将其用于行filter
。我们还使用grepl
到filter
&#39; Type_B&#39;从字符串的开头(,
)到结尾(^
)没有$
。
library(dplyr)
df %>%
group_by(ID) %>%
filter(grepl(paste(Type_A, collapse='|'),
Type_B)|grepl('^[^,]+$', Type_B))
# ID Country Level Type_A Type_B
#1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
#2 DX155 US Level_1B Android Gmail,Android,Drive,Maps
#3 DX156 US Level_1A aaa ALL
#4 DX157 US Level_1B bbb ALL
#5 DX158 US Level_1B ccc ALL
#6 DX159 US Level_1A ddd ALL