library(dplyr)
mydat1 <- data.frame(ID = c(1, 1, 2, 2),
Gender = c("Male", "Female", "Male", "Male"),
Score = c(30, 40, 20, 60))
mydat1 %>%
group_by(ID, Gender) %>%
slice(which.min(Score))
# A tibble: 3 x 3
# Groups: ID, Gender [3]
ID Gender Score
<dbl> <fctr> <dbl>
1 1 Female 40
2 1 Male 30
3 2 Male 20
我正在尝试按ID
和Gender
对行进行分组。然后我想只保留最低Score
的行。上面的代码完美无缺,因为在ID == 2
时,我只保留了得分较低的条目。
mydat2 <- data.frame(ID = c(1, 1, 2, 2),
Gender = c("Male", "Female", "Male", "Male"),
Score = c(NA, NA, 20, 60))
mydat2 %>%
group_by(ID, Gender) %>%
slice(which.min(Score))
# A tibble: 1 x 3
# Groups: ID, Gender [1]
ID Gender Score
<dbl> <fctr> <dbl>
1 2 Male 20
然而,当我有NA时,which.min
不能像我想要的那样工作,因为它不会返回有效的索引。相反,我的所有ID == 1
条目都将被删除。在这种情况下我想要的输出是:
# A tibble: 1 x 3
# Groups: ID, Gender [1]
ID Gender Score
<dbl> <fctr> <dbl>
1 1 Female NA
2 1 Male NA
1 2 Male 20
如何修改我的代码以解决此问题?
编辑:
df2 <- structure(list(pubmed_id = c(23091106L, 23091106L), Gender = structure(c(4L,
4L), .Label = c("", "Both", "female", "Female", "Male"), class = "factor"),
Total_Carrier = c(NA, 1107)), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L), vars = "pubmed_id", drop = TRUE, indices = list(
0:1), group_sizes = 2L, biggest_group_size = 2L, labels = structure(list(
pubmed_id = 23091106L), class = "data.frame", row.names = c(NA,
-1L), vars = "pubmed_id", drop = TRUE, .Names = "pubmed_id"), .Names = c("pubmed_id",
"Gender", "Total_Carrier"))
> df2
# A tibble: 2 x 3
# Groups: pubmed_id [1]
pubmed_id Gender Total_Carrier
<int> <fctr> <dbl>
1 23091106 Female NA
2 23091106 Female 1107
在这个例子中,我希望所需的输出只包含第2行(即载波样本大小为1107的行)。但是,我得到以下结果:
> df2 %>%
group_by(pubmed_id, Gender) %>%
slice(which.min(Total_Carrier) || 1)
# A tibble: 1 x 3
# Groups: pubmed_id, Gender [1]
pubmed_id Gender Total_Carrier
<int> <fctr> <dbl>
1 23091106 Female NA
答案 0 :(得分:3)
which.min
忽略缺失的值,并在输入向量仅包含integer(0)
时返回NA
。您可以在slice
中添加条件检查,即当组中的所有分数都为NA
时,请选择第一行:
mydat2 %>%
group_by(ID, Gender) %>%
slice({idx <- which.min(Score); if(length(idx) > 0) idx else 1})
# A tibble: 3 x 3
# Groups: ID, Gender [3]
# ID Gender Score
# <dbl> <fctr> <dbl>
#1 1 Female NA
#2 1 Male NA
#3 2 Male 20
答案 1 :(得分:2)
您还可以使用arrange
对群组中的分数进行排序,然后slice
选择每个群组的第一行。这样,如果组中只有NA,您仍然会选择第一行:
mydat2 %>%
group_by(ID, Gender) %>%
arrange(ID,Gender,Score) %>%
slice(1)
ID Gender Score
<dbl> <fctr> <dbl>
1 1 Female NA
2 1 Male NA
3 2 Male 20
答案 2 :(得分:1)
以下是which
和pmin
mydat2 %>%
group_by(ID, Gender) %>%
slice(pmin(1, which(Score == min(Score, na.rm = TRUE))[1], na.rm = TRUE))
# A tibble: 3 x 3
# Groups: ID, Gender [3]
# ID Gender Score
# <dbl> <fctr> <dbl>
#1 1 Female NA
#2 1 Male NA
#3 2 Male 20
答案 3 :(得分:1)
使用data.table
library(data.table)
setDT(mydat2)
mydat2[, .(Score = sort(Score)[1]), by = .(ID, Gender)]
# ID Gender Score
# 1: 1 Male NA
# 2: 1 Female NA
# 3: 2 Male 20