R使用条件从多个列中选择某些值

时间:2018-05-04 03:07:30

标签: r

我想使用条件从多个列中选择某些值。(还要将第1行指定为ID#1,将第5行指定为ID#5)

column1 <- c("rice 2", "apple 4", "melon 6", "blueberry 4", "orange 6")
column2 <- c("rice 8", "blueberry 8", "grape 10", "water 10", "mango 3")
column3 <- c("rice 6", "apple 8", "blueberry 12", "pineapple 8", "mango 3")

我希望使用条件仅为米的ID获取新列&gt; 5,蓝莓&gt; 7或橙色&gt; 5

首先,我想获得ID#1,ID#2,ID#3,ID#5

其次,我会计算每个ID满足多少条件 我想得到结果

ID#1 -> 2 conditions met
ID#2 -> 1 conditions met
ID#3 -> 1 conditions met
ID#4 -> 0 conditions met
ID#5 -> 1 conditions met

1 个答案:

答案 0 :(得分:1)

如果我正确理解了这个问题,那么其中一种方法可能就是

library(dplyr)

cols <- names(df)[-1]

df1 <- df %>%
  mutate_if(is.factor, as.character) %>%
  mutate(rice_gt_5 = (select(., one_of(cols)) %>% 
                        rowwise() %>%
                        mutate_all(funs(strsplit(., split=" ")[[1]][1] =='rice' & as.numeric(strsplit(., split=" ")[[1]][2]) > 5)) %>%
                        rowSums)) %>%
  mutate(blueberry_gt_7 = (select(., one_of(cols)) %>% 
                        rowwise() %>%
                        mutate_all(funs(strsplit(., split=" ")[[1]][1] =='blueberry' & as.numeric(strsplit(., split=" ")[[1]][2]) > 7)) %>%
                        rowSums)) %>%
  mutate(orange_gt_5 = (select(., one_of(cols)) %>% 
                        rowwise() %>%
                        mutate_all(funs(strsplit(., split=" ")[[1]][1] =='orange' & as.numeric(strsplit(., split=" ")[[1]][2]) > 5)) %>%
                        rowSums))

#IDs which satisfy at least one of your conditions i.e. rice > 5 OR blueberry > 7 OR orange > 5
df1$ID[which(df1 %>% select(rice_gt_5, blueberry_gt_7, orange_gt_5) %>% rowSums() >0)]
#[1] 1 2 3 5

#How many conditions are met per ID
df1 %>%
  mutate(no_of_cond_met = rowSums(select(., one_of(c("rice_gt_5", "blueberry_gt_7", "orange_gt_5"))))) %>%
  select(ID, no_of_cond_met)
#  ID no_of_cond_met
#1  1              2
#2  2              1
#3  3              1
#4  4              0
#5  5              1

示例数据:

df <- structure(list(ID = 1:5, column1 = structure(c(5L, 1L, 3L, 2L, 
4L), .Label = c("apple 4", "blueberry 4", "melon 6", "orange 6", 
"rice 2"), class = "factor"), column2 = structure(c(4L, 1L, 2L, 
5L, 3L), .Label = c("blueberry 8", "grape 10", "mango 3", "rice 8", 
"water 10"), class = "factor"), column3 = structure(c(5L, 1L, 
2L, 4L, 3L), .Label = c("apple 8", "blueberry 12", "mango 3", 
"pineapple 8", "rice 6"), class = "factor")), .Names = c("ID", 
"column1", "column2", "column3"), row.names = c(NA, -5L), class = "data.frame")