当我使用

Question

我想将函数应用于数据框中的每一行。我有一个具有硬件分数的数据框，我想应用一个降低最低分数并计算平均值的函数。以下是涉及的功能

      # take the lowest score and drops it and then calculates average
      score_hw_d <- function(hw) {
      return (get_average(drop_lowest(hw)))
          }

     #drops lowest score
     drop_lowest <- function(x){
       x <- sort(x, decreasing = TRUE)
       x <- c(x[1:(length(x)-1)])
       x
        }
     # calculates average 

      get_average <- function(x,na.rm=TRUE) {if(mode(x) != 'numeric') 
      {stop("non-numeric argument")}
      if(na.rm==TRUE){
        x = remove_missing(x)}
       total <- 0
       for (n in 1:length(x)) {
        total= total  + x[n]

          }
         return(total/length(x))
         }

这是一个快照数据集hws（更多行

）

     new1  new2  new3  new4   new5  new6  new7  new8  new9
 1   100.0 100.0 100.0 100.0 100.00 100.0 100.0 100.0 100.0
  2    85.0  95.0 100.0  95.0  95.00  95.0 100.0 100.0 100.0
  3    87.5 100.0  85.0  70.0 100.00  98.0   0.0  80.0   0.0
  4    92.5 100.0 100.0 100.0  96.25  99.0 100.0  92.5  95.0
   5    32.5   0.0  65.0  60.0   0.00  46.0   0.0   0.0   0.0
  6    75.0  85.0  92.5  95.0 100.00  91.0   0.0   0.0  90.0
  7    90.0 100.0  97.5  95.0  80.00  80.0  52.0  90.0  90.0
  8    92.5  95.0 100.0  90.0 100.00  72.0  95.0  74.5 100.0
   9    82.5  85.0  92.5  70.0 100.00   0.0  84.0  90.0  95.0

当我使用

时

 data$homework <- apply(hws,1,score_hw_d)

我得到一个包含空白值的新列，有什么帮助吗？

Answer 1

无需两个自定义功能即可实现功能：

hws = read.table(text="     new1  new2  new3  new4   new5  new6  new7  new8  new9
1   100.0 100.0 100.0 100.0 100.00 100.0 100.0 100.0 100.0
2    85.0  95.0 100.0  95.0  95.00  95.0 100.0 100.0 100.0
3    87.5 100.0  85.0  70.0 100.00  98.0   0.0  80.0   0.0
4    92.5 100.0 100.0 100.0  96.25  99.0 100.0  92.5  95.0
5    32.5   0.0  65.0  60.0   0.00  46.0   0.0   0.0   0.0
6    75.0  85.0  92.5  95.0 100.00  91.0   0.0   0.0  90.0
7    90.0 100.0  97.5  95.0  80.00  80.0  52.0  90.0  90.0
8    92.5  95.0 100.0  90.0 100.00  72.0  95.0  74.5 100.0
9    82.5  85.0  92.5  70.0 100.00   0.0  84.0  90.0  95.0")
apply(hws, 1, FUN=function(x) mean(x[-which.min(x)], na.rm=TRUE))
#        1         2         3         4         5         6         7         8         9 
#100.00000  97.50000  77.56250  97.84375  25.43750  78.56250  90.31250  93.37500  87.37500

Answer 2

您不必为此使用任何自定义功能。可以使用tidyverse

完成

加载tidyverse

library(tidyverse)

选择hws并定义我们称之为行号的学生，因为我们没有名字。
gather所有测试分数将数据集从宽变为长。
group学生的数据框架
按学生组排序数据框，然后按测试分数的升序排列（最低的）。
slice取出第一个分数（最低分）。
spread分数恢复为宽幅。
ungroup数据框
添加除第一（学生）行以外的每一行都带有rowSums的列。然后将其除以数据框中的列数 - 2.由于学生而减去2，以及删除的最低分数。

现在您将看到有NA个分数。您可以始终以长格式保留，并且您仍然可以通过指定mean进行na.rm = TRUE计算和其他所有操作。

hws2 <- hws %>%
  mutate(student = row_number()) %>%
  gather(test, score, contains("new")) %>%
  group_by(student) %>%
  arrange(student, score) %>%
  slice(-1) %>%
  spread(test, score) %>%
  ungroup() %>%
  mutate(average = rowSums(.[,2:ncol(.)], na.rm = TRUE)/(ncol(.) - 2))

结果：

> hws2
# A tibble: 9 x 11
  student  new1  new2  new3  new4   new5  new6  new7  new8  new9   average
    <int> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
1       1    NA   100 100.0   100 100.00   100   100 100.0   100 100.00000
2       2    NA    95 100.0    95  95.00    95   100 100.0   100  97.50000
3       3  87.5   100  85.0    70 100.00    98    NA  80.0     0  77.56250
4       4    NA   100 100.0   100  96.25    99   100  92.5    95  97.84375
5       5  32.5    NA  65.0    60   0.00    46     0   0.0     0  25.43750
6       6  75.0    85  92.5    95 100.00    91    NA   0.0    90  78.56250
7       7  90.0   100  97.5    95  80.00    80    NA  90.0    90  90.31250
8       8  92.5    95 100.0    90 100.00    NA    95  74.5   100  93.37500
9       9  82.5    85  92.5    70 100.00    NA    84  90.0    95  87.37500

希望这能满足您的需求！

应用功能斗争

当我使用

2 个答案: