用于在向量中查找第一个和最后一个变量的函数

时间:2016-01-25 22:17:54

标签: r time-series

我正在寻找nav { background-color: rgba(255, 255, 255, .95); .brand-logo, ul a { color: $black; padding: 0 15px; } .brand-logo { font-weight: 400; letter-spacing: -2px; height: 100%; max-width: 100%; .tighter { letter-spacing: -3px; } } } 函数来查找向量中的第一个和最后一个变量,类似于R表示最小值,min表示最大值。我知道我可以计算向量的长度,并从那里开始,但它不适用于我需要的东西。

我有一个以下数据集(实际上更大):

max

我能够计算每个学生的最高分数是这样的:

a<-(c("2013-02-25","2013-03-13","2013-04-24","2013-05-12","2013-07-12","2013-08-11","actual_exam_date"))
b<-c(300,230,400,NA,NA,NA,"2013-04-30")
c<-c(NA,260,410,420,NA,NA,"2013-05-30")
d<-c(300,230,400,NA,370,390,"2013-08-30")
df<-as.data.frame(rbind(b,c,d))
colnames(df)<-a
rownames(df)<-(c("student 1","student 2","student 3"))
df$student_id <- row.names(df)
library('reshape2')
df2 <- melt(df, id.vars = c('student_id','actual_exam_date'),
                variable.name = 'pretest_date',
                value.name = 'pretest_score')
df2 <- df2[!is.na(df2$pretest_score),]
df2$actual_exam_date <- as.Date(df2$actual_exam_date)
df2$pretest_date <- as.Date(df2$pretest_date)
df2$days_before_exam <- as.integer(df2$actual_exam_date - df2$pretest_date)
df2$pretest_score <- as.numeric(df2$pretest_score)
df2

现在我希望确定每个学生的第一个和最后一个预测分数,以计算它们之间的差异。有没有办法使用聚合来做到这一点?

5 个答案:

答案 0 :(得分:3)

首先使用

按预先测试日期订购数据
df2 <- df2[order(df2$pretest_date), ]

然后

aggregate(pretest_score ~ student_id, df2, function(x) tail(x, 1) - x[1])
  student_id pretest_score
1  student 1           100
2  student 2           160
3  student 3            90

答案 1 :(得分:3)

每个学生的第一个价值

> aggregate(pretest_score ~ student_id, df2, head, 1)
  student_id pretest_score
1  student 1           300
2  student 2           260
3  student 3           300

每位学生的最后价值

  > aggregate(pretest_score ~ student_id, df2, tail, 1)
      student_id pretest_score
    1  student 1           400
    2  student 2           420
    3  student 3           390

答案 2 :(得分:3)

使用data.table的另一个建议允许您订购(如果需要)并获得结果而不指定匿名函数并且在一行中

library(data.table)
setDT(df2)[order(pretest_date), diff(pretest_score[c(1, .N)]), keyby = student_id]
#    student_id  V1
# 1:  student 1 100
# 2:  student 2 160
# 3:  student 3  90

答案 3 :(得分:2)

有几种方法可以满足您的要求。使用dplyr,您可以找到最小值和最大值

 df2 %>% group_by(student_id) %>%
         filter(pretest_score == max(pretest_score) | pretest_score == min(pretest_score)) %>%
         mutate(differ = max(pretest_score) - min(pretest_score))
Source: local data frame [6 x 6]
Groups: student_id [3]

  student_id actual_exam_date pretest_date pretest_score days_before_exam differ
       (chr)           (date)       (date)         (dbl)            (int)  (dbl)
1  student 1       2013-04-30   2013-03-13           230               48    170
2  student 2       2013-05-30   2013-03-13           260               78    160
3  student 3       2013-08-30   2013-03-13           230              170    170
4  student 1       2013-04-30   2013-04-24           400                6    170
5  student 3       2013-08-30   2013-04-24           400              128    170
6  student 2       2013-05-30   2013-05-12           420               18    160

with aggregate:

 aggregate(df2$pretest_score, list(df2$student_id), FUN = function(x) max(x) - min(x))
    Group.1   x
1 student 1 170
2 student 2 160
3 student 3 170

答案 4 :(得分:2)

我认为使用tapply

会更容易
tapply(df2$pretest_score, df2$student_id, function(x) tail(x,1)-head(x,1))

> student 1 student 2 student 3
        100       160        90