Question

我有一个带有样本的受试者的数据框架和这些样本中的浓度。我想要的是获得主题的向量，其对于任何样本具有conc <1并且对于该相同的subj具有＆gt; = 2用于任何其他样本。

ex <- data.frame(subj = rep(1:6, each = 2), 
           sampleID =  1:12, 
           conc = c(1.7, 1.4, 1.5, 3.2, 3.3, 1.6, 2.7, 2.8, 1.4, NA, NA, 3.9))

很容易找到在任何样本中具有浓度＆lt; 2的subj：

ex %>%                  # conc < 2
   filter(conc < 2) %>% 
   print() %>% 
   distinct(subj) %>% 
   summarise( n())

但我需要的是找到在任何样本中具有浓度<2且在同一患者体内的任何其他样本中具有浓度＆gt; = 2的subj。这是我到目前为止，但它不起作用。正确的答案是只有两个科目（＃2和＃3）的结论都是＆lt; 2而且＆gt; = 2.

ex %>%                  # concs < 2 and also > 2 for each subject
   mutate(lt = ifelse(.$conc < 2, TRUE, FALSE)) %>% 
   mutate(ge = ifelse(.$conc >= 2, TRUE, FALSE)) %>% 
   group_by(subj)  %>% 
   summarise( xor(any(.$lt), any(.$ge)))

我非常喜欢使用dplyr和magrittr管道的解决方案。提前谢谢。

Answer 1

不是dplyr，而是使用data.table：

setDT(ex)[, .(select = any(conc < 2) & any(conc >=2)), by=subj]

返回：

   subj select
1:    1  FALSE
2:    2   TRUE
3:    3   TRUE
4:    4  FALSE
5:    5     NA
6:    6     NA

如果您只想要select向量为TRUE的subj的值：

setDT(ex)[, .(select = any(conc < 2) & any(conc >=2)), by=subj][
  select==T, subj]

如果您要删除NA值，可以将na.rm=T添加到any来电。

Answer 2

使用dplyr：

ex %>%
  group_by(subj) %>%
  filter(any(conc < 2) & any(conc >=2))

注意：这将返回完整集。如果您只想要主题，可以将其修改为：

ex %>%
  group_by(subj) %>%
  filter(any(conc < 2) & any(conc >=2)) %>%
  distinct(subj) %>%
  select(subj)

如果你想要更加冗长，你可以这样做：

less_than_2 <- ex %>%
  group_by(subj) %>%
  filter(conc < 2)

greater_than_or_2 <- ex %>%
  group_by(subj) %>%
  filter(conc >= 2)

intersect(less_than_2$subj, greater_than_or_2$subj)

Answer 3

我不清楚您是否要保留原始数据框并为符合条件的主题添加标记，或者过滤到符合条件的主题。如果是后者，杰森的回答让你满意。如果是前者，你可以这样做：

ex %>%
  group_by(subj) %>%
  summarise(test = min(conc) < 2 & max(conc) >= 2) %>%
  left_join(ex, .)

找到对于任何样本具有值<2且对于任何其他样本而言> = 2的对象

3 个答案: