根据具有条件的列值按组对行进行聚类

时间:2018-06-26 08:57:12

标签: r cluster-computing lag

几天前,我打开了该线程:

Clustering rows by group based on column value

我们在其中获得了以下结果:

df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1),
      Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1),
      Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48),
      ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5))

使用:

df <- df %>% 
group_by(ID) %>% 
mutate_at(vars(Obs1), 
        funs(ClusterObs1= with(rle(.), rep(cumsum(values == 1), lengths))))

现在我必须进行一些修改:

如果“控件”的值大于12并且实际“ Obs1”值等于1且与先前的“ Obs1”值相等,则“ DesiredResultClusterObs1”值应加+1

df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1),
      Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1),
      Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48),
      ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5),
      DesiredResultClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7))

我曾考虑过添加if_else条件,但会有一些乐趣,但是没有任何想法吗?

编辑:对于许多列,情况如何?

1 个答案:

答案 0 :(得分:2)

这似乎可行:

Map<String, Week>

基本上,我们使用您先前线程中的df %>% mutate(DesiredResultClusterOrbs1 = with(rle(Control > 12 & Obs1 == 1 & lag(Obs1) == 1), rep(cumsum(values == 1), lengths)) + ClusterObs1) ID Obs1 Control ClusterObs1 DesiredResultClusterOrbs1 1 1 1 0 1 1 2 1 1 3 1 1 3 1 0 3 1 1 4 1 1 1 2 2 5 1 0 12 2 2 6 1 1 1 3 3 7 1 1 1 3 3 8 1 0 1 3 3 9 1 1 36 4 4 10 1 0 13 4 4 11 1 0 1 4 4 12 1 0 1 4 4 13 1 1 2 5 5 14 1 1 24 5 6 15 1 1 2 5 6 16 1 1 2 5 6 17 1 1 48 5 7 + rle机制根据您条件的rep结果创建一个累积向量并将其添加到现有的{{1} }。


如果要创建多个TRUE/FALSE,则可以使用ClusterObs1。也许有一个DesiredResultClusterOrbs解决方案,但这是基础mapply

数据:

dplyr

循环:

R

这将产生一个包含新列的矩阵,然后您可以将其重命名并df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1), Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1), Obs2 = rbinom(17, 1, .5), Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48), ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5)) df <- df %>% mutate_at(vars(Obs2), funs(ClusterObs2= with(rle(.), rep(cumsum(values == 1), lengths)))) 到数据中:

newcols <- mapply(function(x, y){
  with(rle(df$Control > 12 & x == 1 & lag(x) == 1),
       rep(cumsum(values == 1), lengths)) + y
}, df[2:3], df[5:6])