Question

假设我有一个这样的数据框：

ant

如上所示，set.seed(4) df<-data.frame( group = rep(1:10, each=3), id = rep(sample(1:3), 10), x = sample(c(rep(0, 15), runif(15))), y = sample(c(rep(0, 15), runif(15))), z = sample(c(rep(0, 15), runif(15))) )，x，y向量的某些元素的值为零，其余元素来自0和1之间的均匀分布。

对于每个组，由第一列确定，我想从第二列中找到三个ID，指向z，x，y变量中的最高值组。假设除了在给定组的所有观察中变量取值为0的情况之外没有绘制 - 在这种情况下，我不想将任何数字作为具有最大值的行的id返回。

输出看起来像这样：

我的第一个想法是为每个变量分别选择具有最大值的行，然后使用group x y z 1 2 2 1 2 2 3 1 ... .........将其放在一个表中。但是，我想知道是否可以在没有merge的情况下完成，例如使用标准的merge函数。

Answer 1

以下是我使用plyr建议的解决方案：

ddply(df,.variables = c("group"),
.fun = function(t){apply(X = t[,c(-1,-2)],MARGIN = 2,
function(z){ifelse(sum(abs(z))==0,yes = NA,no = t$id[which.max(z)])})})

#   group  x  y  z
#1      1  2  2  1
#2      2  2  3  1
#3      3  1  3  2
#4      4  3  3  1
#5      5  2  3 NA
#6      6  3  1  3
#7      7  1  1  2
#8      8 NA  2  3
#9      9  2  1  3
#10    10  2 NA  2

Answer 2

解决方案使用dplyr和tidyr。请注意，如果所有数字都相同，我们无法决定应选择哪个id。因此添加filter(n_distinct(Value) > 1)以删除这些记录。在最终输出df2中，NA表示所有数字相同的条件。如果需要，我们可以决定是否稍后将NA归为id。此解决方案适用于任意数量的x或列（y，z，library(dplyr) library(tidyr) df2 <- df %>% gather(Column, Value, -group, -id) %>% arrange(group, Column, desc(Value)) %>% group_by(group, Column) %>% # If all values from a group-Column are all the same, remove that group-Column filter(n_distinct(Value) > 1) %>% slice(1) %>% select(-Value) %>% spread(Column, id)，...）。

Select col1, col2, max(col3)
from test
Group By col3

Answer 3

如果您只想坚持dplyr，可以使用多列summarize / mutate功能。无论id的形式如何，这都应该有效;我的初步尝试稍微清晰但假设零id无效。

df %>%
  group_by(group) %>%
  mutate_at(vars(-id), 
            # If the row is the max within the group, set the value
            # to the id and use NA otherwise
            funs(ifelse(max(.) != 0 & . == max(.),
                        id,
                        NA))) %>%
  select(-id) %>%
  summarize_all(funs(
    # There are zero or one non-NA values per group, so handle both cases
    if(any(!is.na(.)))
      na.omit(.) else NA))
## # A tibble: 10 x 4
##    group     x     y     z
##    <int> <int> <int> <int>
##  1     1     2     2     1
##  2     2     2     3     1
##  3     3     1     3     2
##  4     4     3     3     1
##  5     5     2     3    NA
##  6     6     3     1     3
##  7     7     1     1     2
##  8     8    NA     2     3
##  9     9     2     1     3
## 10    10     2    NA     2

对于每个组，查找具有几列最大值的观察值

3 个答案: