我有以下数据
ID INDUSTRY PRODUCT
625109 PersonalCare Neolone Preservatives
199672 PersonalCare Neolone Preservatives
227047 Pharma Optiphen
186117 Food Sasol BHT
625109 PersonalCare Optiphen
227047 Food Neolone Preservatives
如果ID包含产品Neolone Preservatives和Optiphen,我想提取行。
预期结果
ID INDUSTRY PRODUCT
625109 PersonalCare Neolone Preservatives
227047 Pharma Optiphen
625109 PersonalCare Optiphen
227047 Food Neolone Preservatives
ID 625109和227047单独包含两个产品,因此提取。我怎么能在R中这样做?
答案 0 :(得分:2)
多种方法:
在dplyr
df %>%
group_by(ID) %>%
filter(all(c("Neolone Preservatives", "Optiphen") %in% PRODUCT))
# ID INDUSTRY PRODUCT
# <int> <chr> <chr>
#1 625109 PersonalCare Neolone Preservatives
#2 227047 Pharma Optiphen
#3 625109 PersonalCare Optiphen
#4 227047 Food Neolone Preservatives
在基地R:
df[ave(df$PRODUCT, df$ID, FUN = function(x)
all(c("Neolone_Preservatives", "Optiphen") %in% x)) == "TRUE", ]
答案 1 :(得分:1)
这应该有效:
library(dplyr)
df <- data.frame(ID = c(62, 19, 22, 18, 62, 22),
INDUSTRY = c("PC", "PC", "P", "F", "PC", "F"),
PRODUCT = c("NP", "NP", "O", "SB", "O", "NP"))
df %>%
group_by(ID) %>%
filter(any(PRODUCT %in% c("NP"))& any(PRODUCT %in% c("O")))
# A tibble: 4 x 3
# Groups: ID [2]
ID INDUSTRY PRODUCT
<dbl> <fctr> <fctr>
1 62 PC NP
2 22 P O
3 62 PC O
4 22 F NP
答案 2 :(得分:0)
你可以用库dplyr
来做filteredData<-data %>%
filter(INDUSTRY=='PersonalCare',PRODUCT=='Optiphen')