Question

我想通过因子数据帧获得分割中的最大位置，以便我可以在另一个数据帧中获取这些位置的值。例如。让我们说我有这个DF：

df1=data.frame(groups=c('a','a','b','b','b','c'), c1=c(1:6), c2=c(2:7), c3=c(4:9))

print(df1)

row.    groups  c1  c2  c3

1.  a   1   2   4
2.  a   2   3   5
3.  b   3   4   6
4.  b   4   5   7
5.  b   5   6   8
6.  c   6   7   9


aggregate(df1[,2:4], by=list(df1$groups), FUN=max)



Group.1 c1 c2 c3

 1.       a  2  3  5      
 2.       b  5  6  8
 3.       c  6  7  9

我发现按组分别找到每列的最大值很容易。但是现在我希望聚合最大值的位置在另一个数据帧中使用，以便：if：

df2=cbind(df1$groups,0-df1[,2:4])

print(df2)



df1$groups c1 c2 c3

1.          a -1 -2 -4        
2.          a -2 -3 -5
3.          b -3 -4 -6
4.          b -4 -5 -7
5.          b -5 -6 -8
6.          c -6 -7 -9

我希望df2中的值在df1中是max。 e.g：

Group.1 c1 c2 c3

1.       a  -2  -3  -5     
2.       b  -5  -6  -8
3.       c  -6  -7  -9

（这是仅针对该示例进行的制作。我的原始数据更复杂，但这是我需要的）

与此同时，我已经完成了双循环，但实际上效率不高。

我正在使用：

R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

Answer 1

一个想法是将rownames转换为列，并使用它们创建索引向量。然后使用该向量来过滤df2，即

library(tidyverse)

ind <-df1 %>% 
        rownames_to_column('rn') %>% 
        group_by(groups) %>% 
        filter_at(names(.)[3:5], all_vars(. == max(.))) %>% 
        pull(rn)

#[1] "2" "5" "6"

df2[i1,]

给出了

  df1$groups c1 c2 c3
2          a -2 -3 -5
5          b -5 -6 -8
6          c -6 -7 -9

Answer 2

如果最大值的行索引可能因列与列不同而显得不是特别微不足道（正如我在上面的答案的评论中提到的那样）。我认为你无法避免双重迭代（一旦超过组a / b / c和一次超过列）以获得每列的最大值。您可以通过以下方式执行此操作：

library(dplyr)
idx_df <- df1 %>% group_by(groups) %>% summarise_all(which.max) %>% 
  as.data.frame() %>% select(-groups)
df2_split <- df2[,-1] %>% split(df2$"df1$groups")

sapply(seq_along(df2_split), function(df_idx) 
  sapply(seq_along(df2_split[[df_idx]]), function(col_idx)
    df2_split[[df_idx]][idx_df[df_idx,col_idx], col_idx])
  ) %>% t %>% 
  as.data.frame() %>% 
  `rownames<-`(names(df2_split)) %>% 
  `colnames<-`(colnames(idx_df))

所以这首先创建一个idx_df来存储每个组的最大值的索引（你也可以使用aggregate），然后通过首先拆分df2来检索df2中的最大值。 / p>

这会提高速度吗？全明星解决方案会更优雅，但我不确定这是否可行。

汇总数据框的最大位置

2 个答案: