我一直坚持这个dplyr操纵问题已经有一段时间了。
以下是我的数据的一小部分样本:dput(test)
structure(list(anon_screen_name = c("40492fd6e817cc25cea942be9eae7c1c5795ffa1",
"862329793fdbcd666d660d9a9d2e3beceb07a0db", "862329793fdbcd666d660d9a9d2e3beceb07a0db",
"862329793fdbcd666d660d9a9d2e3beceb07a0db", "862329793fdbcd666d660d9a9d2e3beceb07a0db",
"862329793fdbcd666d660d9a9d2e3beceb07a0db", "862329793fdbcd666d660d9a9d2e3beceb07a0db",
"862329793fdbcd666d660d9a9d2e3beceb07a0db", "a9c8719499b9ef73c78e85bada231591d807a821",
"a9c8719499b9ef73c78e85bada231591d807a821"), resource_display_name = c("Quiz",
"Quiz", "Quiz", "Quiz", "Quiz", "homework", "homework", "final_exam",
"Quiz", "Quiz"), grade = c(0L, 0L, 0L, 3L, 1L, 0L, 1L, 1L, 1L,
2L), max_grade = c(2L, 1L, 0L, 3L, 1L, 10L, 11L, 1L, 1L, 2L),
percent_grade = c("0", "0", "\\N", "100", "100", "0", "9.09",
"100", "100", "100")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
基本上,对于每个anon_screen_name
,我想放弃作业的最低percent_grade
(resource_display_name
)。
我开始写这个入门代码:
test %>%
mutate(percent_grade = as.numeric(percent_grade)) %>%
group_by(resource_display_name) %>%
summarise(min_percent_grade = min(percent_grade, na.rm = T))
但是这只显示了最低作业成绩而没有取出最低作业成绩的行
更新:
基本上,借用下面的评论,我想删除与percent_grade
的最低值相关联的行,其中resource_display_name =='homework'
答案 0 :(得分:2)
请尝试以下代码:
test %>%
mutate(percent_grade = as.numeric(percent_grade)) %>%
filter(resource_display_name == 'homework') %>%
filter(percent_grade > min(percent_grade, na.rm = T)) -> t1
test %>%
mutate(percent_grade = as.numeric(percent_grade)) %>%
filter(resource_display_name != 'homework') -> t2
rbind(t1,t2)
答案 1 :(得分:0)
以下内容将删除所有等于resource_display_name
每组最小值的值。请注意,它是一个基本R解决方案,不需要外部包,例如dplyr
。
inx <- with(test, ave(as.numeric(percent_grade), resource_display_name, FUN = function(x) x != min(x, na.rm = TRUE)))
inx <- which(as.logical(inx))
test[inx, ]
答案 2 :(得分:0)
如果您只想删除单个记录,而不是所有等级最低的记录,则可以执行以下操作:
test %>%
mutate(percent_grade = as.numeric(percent_grade)) %>%
group_by(anon_screen_name) %>%
mutate(lowest_grade = 1 * ((percent_grade == min(percent_grade, na.rm=TRUE)) & (resource_display_name == 'homework'))) %>%
arrange(lowest_grade) %>%
filter(row_number() != n()) %>%
ungroup()