让我说我研究动物的反应时间。 在指定的时间点,给动物一个刺激,并记录反应时间。
# Registrations were done on the following occasions (minutes)
intervals <- c(1, 2, 3, 4, 5, 6, 7)
# Read in example data (this is how my data looks)
data <- read.table(header=T, text="
id value1 value2 value3 value4 value5 value6 value7
a 1.1 2.2 3.3 4.3 5.7 5.5 3.1
b 1.5 2.5 3.5 4.5 5.5 4.5 3.5
c 3.4 6.4 8.9 9.1 10.3 8.0 5.4
")
# a simple plot showing the first animal's results
plot(intervals, data[1,2:8]); lines(intervals, data[1,2:8])
# For each animal I need to find the X value for a reaction time of 3.0 (for instance)
# I do this with 'optimize'. For the first animal (id='a'):
approx <- approxfun(intervals, data[1,2:8])
optimize(function(t0) abs(approx(t0) - 3), interval = c(2, 5))[[1]]
[1] 2.727291
但是,一次为一个主题做这件事非常耗时...特别是我必须提供合适间隔的部分(上图:interval = c(0,7));如果间隔未指定,则该函数可能返回错误的结果。
我已尝试过dplyr来执行rowwise()和group_by(),但没有任何成功。
有什么建议吗?非常欢迎这些
答案 0 :(得分:2)
忽略optimise
方法的答案,而是选择要优化的特定区域(范围)。这将是通过线性模型连接两个点的线。然后它使用这个模型进行预测。
我强烈建议您逐步运行该过程,以了解初始数据集如何重新整形和更新。
library(dplyr)
library(broom)
library(tidyr)
# Registrations were done on the following occasions (minutes)
intervals <- c(1, 2, 3, 4, 5, 6, 7)
# Read in example data (this is how my data looks)
dt <- read.table(header=T, text="
id value1 value2 value3 value4 value5 value6 value7
a 1.1 2.2 3.3 4.3 5.7 5.5 3.1
b 1.5 2.5 3.5 4.5 5.5 4.5 3.5
c 3.4 6.4 8.9 9.1 10.3 8.0 5.4
")
# input measurement to investigate
y_input = 3
# create process
gather(dt,x,y,-id) %>% # reshape dataset
mutate(x = as.numeric(gsub("value","",x))) %>% # get the minutes as a number from your column
arrange(id,x) %>% # need to order on id and x to visualise results
group_by(id) %>% # for each animal
mutate(lagx = lag(x, default=x[1]), # create columns with previous x and y values
lagy = lag(y, default=y[1])) %>%
rowwise() %>%
mutate(flag = (between(y_input, y, lagy) | between(y_input,lagy,y))) %>% # flag when measurements y belong to that range
filter(flag == TRUE) %>% # keep the rows with the appropriate ranges
group_by(id) %>%
slice(1) %>% # keep only first range in case of multiple ranges for a specific animal
do({x=c(.$x,.$lagx) # build a model (line to connect points) and predict
y=c(.$y,.$lagy)
model = lm(x~y)
xpred = predict(model, newdata=data.frame(y=y_input))
data.frame(xpred)})
# id xpred
# 1 a 2.727273
# 2 b 2.500000
注意,该过程识别出id = c的动物的第一个测量值等于3.4并将其排除,因为3不属于某个范围(两次测量之间)。如果你想调查测量= 4也将包括动物c。
尝试使用各种数字来试验该过程,以测试它是否正确。您可能能够发现任何错误。