我想将函数应用于数据框中的每一行。我有一个具有硬件分数的数据框,我想应用一个降低最低分数并计算平均值的函数。以下是涉及的功能
# take the lowest score and drops it and then calculates average
score_hw_d <- function(hw) {
return (get_average(drop_lowest(hw)))
}
#drops lowest score
drop_lowest <- function(x){
x <- sort(x, decreasing = TRUE)
x <- c(x[1:(length(x)-1)])
x
}
# calculates average
get_average <- function(x,na.rm=TRUE) {if(mode(x) != 'numeric')
{stop("non-numeric argument")}
if(na.rm==TRUE){
x = remove_missing(x)}
total <- 0
for (n in 1:length(x)) {
total= total + x[n]
}
return(total/length(x))
}
这是一个快照数据集hws(更多行
) new1 new2 new3 new4 new5 new6 new7 new8 new9
1 100.0 100.0 100.0 100.0 100.00 100.0 100.0 100.0 100.0
2 85.0 95.0 100.0 95.0 95.00 95.0 100.0 100.0 100.0
3 87.5 100.0 85.0 70.0 100.00 98.0 0.0 80.0 0.0
4 92.5 100.0 100.0 100.0 96.25 99.0 100.0 92.5 95.0
5 32.5 0.0 65.0 60.0 0.00 46.0 0.0 0.0 0.0
6 75.0 85.0 92.5 95.0 100.00 91.0 0.0 0.0 90.0
7 90.0 100.0 97.5 95.0 80.00 80.0 52.0 90.0 90.0
8 92.5 95.0 100.0 90.0 100.00 72.0 95.0 74.5 100.0
9 82.5 85.0 92.5 70.0 100.00 0.0 84.0 90.0 95.0
data$homework <- apply(hws,1,score_hw_d)
我得到一个包含空白值的新列,有什么帮助吗?
答案 0 :(得分:1)
无需两个自定义功能即可实现功能:
hws = read.table(text=" new1 new2 new3 new4 new5 new6 new7 new8 new9
1 100.0 100.0 100.0 100.0 100.00 100.0 100.0 100.0 100.0
2 85.0 95.0 100.0 95.0 95.00 95.0 100.0 100.0 100.0
3 87.5 100.0 85.0 70.0 100.00 98.0 0.0 80.0 0.0
4 92.5 100.0 100.0 100.0 96.25 99.0 100.0 92.5 95.0
5 32.5 0.0 65.0 60.0 0.00 46.0 0.0 0.0 0.0
6 75.0 85.0 92.5 95.0 100.00 91.0 0.0 0.0 90.0
7 90.0 100.0 97.5 95.0 80.00 80.0 52.0 90.0 90.0
8 92.5 95.0 100.0 90.0 100.00 72.0 95.0 74.5 100.0
9 82.5 85.0 92.5 70.0 100.00 0.0 84.0 90.0 95.0")
apply(hws, 1, FUN=function(x) mean(x[-which.min(x)], na.rm=TRUE))
# 1 2 3 4 5 6 7 8 9
#100.00000 97.50000 77.56250 97.84375 25.43750 78.56250 90.31250 93.37500 87.37500
答案 1 :(得分:1)
您不必为此使用任何自定义功能。可以使用tidyverse
加载tidyverse
library(tidyverse)
hws
并定义我们称之为行号的学生,因为我们没有名字。gather
所有测试分数将数据集从宽变为长。group
学生的数据框架slice
取出第一个分数(最低分)。spread
分数恢复为宽幅。ungroup
数据框rowSums
的列。然后将其除以数据框中的列数 - 2.由于学生而减去2,以及删除的最低分数。现在您将看到有NA
个分数。您可以始终以长格式保留,并且您仍然可以通过指定mean
进行na.rm = TRUE
计算和其他所有操作。
hws2 <- hws %>%
mutate(student = row_number()) %>%
gather(test, score, contains("new")) %>%
group_by(student) %>%
arrange(student, score) %>%
slice(-1) %>%
spread(test, score) %>%
ungroup() %>%
mutate(average = rowSums(.[,2:ncol(.)], na.rm = TRUE)/(ncol(.) - 2))
结果:
> hws2
# A tibble: 9 x 11
student new1 new2 new3 new4 new5 new6 new7 new8 new9 average
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 100 100.0 100 100.00 100 100 100.0 100 100.00000
2 2 NA 95 100.0 95 95.00 95 100 100.0 100 97.50000
3 3 87.5 100 85.0 70 100.00 98 NA 80.0 0 77.56250
4 4 NA 100 100.0 100 96.25 99 100 92.5 95 97.84375
5 5 32.5 NA 65.0 60 0.00 46 0 0.0 0 25.43750
6 6 75.0 85 92.5 95 100.00 91 NA 0.0 90 78.56250
7 7 90.0 100 97.5 95 80.00 80 NA 90.0 90 90.31250
8 8 92.5 95 100.0 90 100.00 NA 95 74.5 100 93.37500
9 9 82.5 85 92.5 70 100.00 NA 84 90.0 95 87.37500
希望这能满足您的需求!