我想使用条件从多个列中选择某些值。(还要将第1行指定为ID#1,将第5行指定为ID#5)
column1 <- c("rice 2", "apple 4", "melon 6", "blueberry 4", "orange 6")
column2 <- c("rice 8", "blueberry 8", "grape 10", "water 10", "mango 3")
column3 <- c("rice 6", "apple 8", "blueberry 12", "pineapple 8", "mango 3")
我希望使用条件仅为米的ID获取新列&gt; 5,蓝莓&gt; 7或橙色&gt; 5
首先,我想获得ID#1,ID#2,ID#3,ID#5
其次,我会计算每个ID满足多少条件 我想得到结果
ID#1 -> 2 conditions met
ID#2 -> 1 conditions met
ID#3 -> 1 conditions met
ID#4 -> 0 conditions met
ID#5 -> 1 conditions met
答案 0 :(得分:1)
如果我正确理解了这个问题,那么其中一种方法可能就是
library(dplyr)
cols <- names(df)[-1]
df1 <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(rice_gt_5 = (select(., one_of(cols)) %>%
rowwise() %>%
mutate_all(funs(strsplit(., split=" ")[[1]][1] =='rice' & as.numeric(strsplit(., split=" ")[[1]][2]) > 5)) %>%
rowSums)) %>%
mutate(blueberry_gt_7 = (select(., one_of(cols)) %>%
rowwise() %>%
mutate_all(funs(strsplit(., split=" ")[[1]][1] =='blueberry' & as.numeric(strsplit(., split=" ")[[1]][2]) > 7)) %>%
rowSums)) %>%
mutate(orange_gt_5 = (select(., one_of(cols)) %>%
rowwise() %>%
mutate_all(funs(strsplit(., split=" ")[[1]][1] =='orange' & as.numeric(strsplit(., split=" ")[[1]][2]) > 5)) %>%
rowSums))
#IDs which satisfy at least one of your conditions i.e. rice > 5 OR blueberry > 7 OR orange > 5
df1$ID[which(df1 %>% select(rice_gt_5, blueberry_gt_7, orange_gt_5) %>% rowSums() >0)]
#[1] 1 2 3 5
#How many conditions are met per ID
df1 %>%
mutate(no_of_cond_met = rowSums(select(., one_of(c("rice_gt_5", "blueberry_gt_7", "orange_gt_5"))))) %>%
select(ID, no_of_cond_met)
# ID no_of_cond_met
#1 1 2
#2 2 1
#3 3 1
#4 4 0
#5 5 1
示例数据:
df <- structure(list(ID = 1:5, column1 = structure(c(5L, 1L, 3L, 2L,
4L), .Label = c("apple 4", "blueberry 4", "melon 6", "orange 6",
"rice 2"), class = "factor"), column2 = structure(c(4L, 1L, 2L,
5L, 3L), .Label = c("blueberry 8", "grape 10", "mango 3", "rice 8",
"water 10"), class = "factor"), column3 = structure(c(5L, 1L,
2L, 4L, 3L), .Label = c("apple 8", "blueberry 12", "mango 3",
"pineapple 8", "rice 6"), class = "factor")), .Names = c("ID",
"column1", "column2", "column3"), row.names = c(NA, -5L), class = "data.frame")