找到最小值和最大值,并为R中的每个唯一标识符(分组元素)创建这些列

时间:2016-12-14 14:12:56

标签: r dplyr

我有以下数据集:

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,"NA", 867.3, "NA", "NA", 890.3,"NA","NA",871.2,"NA",868.7,"NA",866.2, "NA",
"NA",851,"NA","NA",842,"NA","NA",880,860,851.8,"NA",841)

df <- data.frame(MC,ASN,Dia)

df

我想查找每个MC,min和max Dia值,并在结果表中排列如下所示:

MC          Dia     Min_Dia Max_Dia
OS000348    870     867.3   890.3
OS000361    871.2   841     871.2
OS000375    880     841     880

我正在尝试使用dplyr包和以下内容:

result1 <- 
  df %>% 
  group_by(MC) %>% 
  arrange(MC) %>%
  slice(c(1, n())) %>%
  mutate(minmax = c("Min", "Max")) %>%
  gather(var, val, Dia) %>%
  unite(key, minmax, var) %>%
  spread(key, val)

但我没有按照我想要的方式得到表格(上面的第二张表格)。

可以有其他选择吗?

1 个答案:

答案 0 :(得分:3)

首先,您需要输入NA作为NA而不是"NA",否则R将其作为字符向量读取,您不能使用min()函数。此代码生成所需的输出:

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA,
         NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841)

df <- data.frame(MC,ASN,Dia)

library(dplyr)

df <- df %>%
  group_by(MC) %>%
  mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T))

如果您只想观察MC,请使用此方法:

df2 <- df %>%
  group_by(MC) %>%
  mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>%
  ungroup() %>%
  distinct(MC, minDia, maxDia)