对受试者重复时间序列功能并找到最佳间隔

时间:2015-09-05 07:15:25

标签: r

让我说我研究动物的反应时间。 在指定的时间点,给动物一个刺激,并记录反应时间。

# Registrations were done on the following occasions (minutes)
intervals <- c(1, 2, 3, 4, 5, 6, 7)

# Read in example data (this is how my data looks)
data <- read.table(header=T, text="
id value1 value2 value3 value4 value5 value6 value7
a 1.1 2.2 3.3 4.3 5.7 5.5 3.1
b 1.5 2.5 3.5 4.5 5.5 4.5 3.5
c 3.4 6.4 8.9 9.1 10.3 8.0 5.4
")

# a simple plot showing the first animal's results
plot(intervals, data[1,2:8]); lines(intervals, data[1,2:8])

# For each animal I need to find the X value for a reaction time of 3.0 (for instance)
# I do this with 'optimize'. For the first animal (id='a'):
approx <- approxfun(intervals, data[1,2:8])
optimize(function(t0) abs(approx(t0) - 3), interval = c(2, 5))[[1]]
[1] 2.727291

但是,一次为一个主题做这件事非常耗时...特别是我必须提供合适间隔的部分(上图:interval = c(0,7));如果间隔未指定,则该函数可能返回错误的结果。

我已尝试过dplyr来执行rowwise()和group_by(),但没有任何成功。

有什么建议吗?非常欢迎这些

1 个答案:

答案 0 :(得分:2)

忽略optimise方法的答案,而是选择要优化的特定区域(范围)。这将是通过线性模型连接两个点的线。然后它使用这个模型进行预测。

我强烈建议您逐步运行该过程,以了解初始数据集如何重新整形和更新。

library(dplyr)
library(broom)
library(tidyr)

# Registrations were done on the following occasions (minutes)
intervals <- c(1, 2, 3, 4, 5, 6, 7)

# Read in example data (this is how my data looks)
dt <- read.table(header=T, text="
                   id value1 value2 value3 value4 value5 value6 value7
                   a 1.1 2.2 3.3 4.3 5.7 5.5 3.1
                   b 1.5 2.5 3.5 4.5 5.5 4.5 3.5
                   c 3.4 6.4 8.9 9.1 10.3 8.0 5.4
                   ")



# input measurement to investigate
y_input = 3

# create process
gather(dt,x,y,-id) %>%                                # reshape dataset 
  mutate(x = as.numeric(gsub("value","",x))) %>%      # get the minutes as a number from your column
  arrange(id,x) %>%                                   # need to order on id and x to visualise results  
  group_by(id) %>%                                    # for each animal
  mutate(lagx = lag(x, default=x[1]),                 # create columns with previous x and y values
         lagy = lag(y, default=y[1])) %>%                                       
  rowwise() %>%
  mutate(flag = (between(y_input, y, lagy) | between(y_input,lagy,y))) %>%        # flag when measurements y belong to that range
  filter(flag == TRUE) %>%                                            # keep the rows with the appropriate ranges
  group_by(id) %>%
  slice(1) %>%                                  # keep only first range in case of multiple ranges for a specific animal
  do({x=c(.$x,.$lagx)                           # build a model (line to connect points) and predict
  y=c(.$y,.$lagy)
  model = lm(x~y)
  xpred = predict(model, newdata=data.frame(y=y_input))
  data.frame(xpred)})

#     id    xpred
#   1  a 2.727273
#   2  b 2.500000

注意,该过程识别出id = c的动物的第一个测量值等于3.4并将其排除,因为3不属于某个范围(两次测量之间)。如果你想调查测量= 4也将包括动物c。

尝试使用各种数字来试验该过程,以测试它是否正确。您可能能够发现任何错误。