我有以下数据集:
MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,"NA", 867.3, "NA", "NA", 890.3,"NA","NA",871.2,"NA",868.7,"NA",866.2, "NA",
"NA",851,"NA","NA",842,"NA","NA",880,860,851.8,"NA",841)
df <- data.frame(MC,ASN,Dia)
df
我想查找每个MC,min和max Dia值,并在结果表中排列如下所示:
MC Dia Min_Dia Max_Dia
OS000348 870 867.3 890.3
OS000361 871.2 841 871.2
OS000375 880 841 880
我正在尝试使用dplyr包和以下内容:
result1 <-
df %>%
group_by(MC) %>%
arrange(MC) %>%
slice(c(1, n())) %>%
mutate(minmax = c("Min", "Max")) %>%
gather(var, val, Dia) %>%
unite(key, minmax, var) %>%
spread(key, val)
但我没有按照我想要的方式得到表格(上面的第二张表格)。
可以有其他选择吗?
答案 0 :(得分:3)
首先,您需要输入NA作为NA
而不是"NA"
,否则R将其作为字符向量读取,您不能使用min()
函数。此代码生成所需的输出:
MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA,
NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841)
df <- data.frame(MC,ASN,Dia)
library(dplyr)
df <- df %>%
group_by(MC) %>%
mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T))
如果您只想观察MC,请使用此方法:
df2 <- df %>%
group_by(MC) %>%
mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>%
ungroup() %>%
distinct(MC, minDia, maxDia)