nls回归并存储输出系数和图

时间:2016-12-09 14:42:33

标签: r dataframe ggplot2 plyr nls

我尝试按How to make a great R reproducible example?

的建议制作可重现的数据
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ANK.26.1", 
"ANK.35.10"), class = "factor"), DAY = c(2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 
18L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L), carbon = c(1684.351094778, 3514.451339358, 
6635.877888654, 10301.700591252, 11361.360992769, 11891.934331254, 
12772.885869486, 13545.127224369, 14022.00520767, 14255.045990397, 
14479.813468278, 14611.749542181, 14746.382638335, 14942.733363567, 
14961.338739162, 15049.433738817, 15047.197961499, 1705.361701104, 
3293.593040601, 4788.872254899, 6025.622715999, 6670.80499518, 
7150.526272512, 7268.955557607, 7513.61998338, 7896.202773246, 
8017.953574608, 8146.09464786, 8286.148260324, 8251.229520243, 
8384.244997158, 8413.034235219, 8461.066691601, 8269.360979031
), g.rate.perc = c(NA, 1.08653133557123, 0.888168948119852,0.55242467750436, 
0.102862667394628, 0.0466998046116733, 0.0740797513417739, 0.060459426536321, 
0.0352066079115925, 0.0166196474238596, 0.0157675729725753, 0.00911172469120847, 
0.00921402983026387, 0.0133151790542558, 0.00124511193115184, 
0.00588817626489591, -0.000148562222127446, NA, 0.931316411333049, 
0.45399634862756, 0.258255053647507, 0.107073129133681, 0.0719135513148148, 
0.0165623173150578, 0.0336588143694119, 0.0509185706373581,0.0154189051191185, 
0.0159817679236518, 0.0171927308137518, -0.00421410998016991, 
0.0161206856006937, 0.00343373053515927, 0.00570929049366353, 
-0.0226573929218994), max.carb = c(15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601
)), .Names = c("ID", "DAY", "carbon", "g.rate.perc", "max.carb"
), row.names = c(NA, 34L), class = "data.frame")

'data.frame':   34 obs. of  5 variables:
 $ ID         : Factor w/ 150 levels "ANK.26.1","ANK.35.10",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ DAY        : int  2 3 4 5 6 7 8 9 10 11 ...
 $ carbon     : num  1684 3514 6636 10302 11361 ...
 $ g.rate.perc: num  NA 1.087 0.888 0.552 0.103 ...
 $ max.carb   : num  15049 15049 15049 15049 15049 ...

在样本数据中,ID只有两个级别,而不是指示的150.

我的nls看起来像那样:

res.sample <- ddply (
  d, .(ID),
  function(x){
  mod <- nls(carbon~phi1/(1+exp(-(phi2 + phi3 * DAY))),
         start=list(
           phi1 = x$max.carb,
           phi2 = int[1], 
           phi3 = mean(x$g.rate.perc)),
         data=x,trace=TRUE)
 return(coef(mod))
}
)

phi2实际上是来自

的拦截结果
 int <- coef(lm(DAY~carbon,data=sample))

不幸的是它不再起作用,因为我试图将它包装到ddply周围,但我不能手动浏览所有原始的150级ID。

最重要的是,我想将phi1-phi3的所有三个输出值存储在具有相应ID的数据帧/列表中。我打算通过

来做到这一点
return(coef(mod))

顶部的樱桃将是实际数据的曲线图和顶部的拟合曲线。手动进行子集化我也可以这样做,但这太费时间了。 我减少的ggplot代码是

ggplot(data=n, aes(x = DAY, y = carbon))+ 
 geom_point(stat="identity", size=2) +
 geom_line( aes(DAY,predict(logMod) ))+
 ggtitle("ID")

如果以某种方式包含三重信息的ID不太有用,以下是如何将其返回到另一个版本

sep_sample <- sample %>% separate(ID, c("algae", "id", "nutrient"))

我觉得这个问题太多了,但我真的很努力,而且我只能在这上花很多天。

以下是摘要:

我需要在ID /每种藻类组合的每个级别上运行模型。如果分开它就会有营养。

输出phi应该存储在某种框架/列表/表格中,并且各自标识它们所属的位置。

理想情况下,有一种方法可以在所有这些中包含ggplots,这些方法也会自动生成并存储。

正如我所说,模型本身已经有效,但当我输入ddply结构时,我收到以下错误消息:

Error in numericDeriv(form[[3L]], names(ind), env) : 
  Missing value or an infinity produced when evaluating the model 

我希望这是你可以用某种方式工作的东西,这似乎是一个合理的问题。如果有些页面已经提供了我未找到的类似解决方案,我很乐意看一下。

非常感谢!

1 个答案:

答案 0 :(得分:0)

soo我想出了这个解决方案,这不是我想要的,但我认为距离它已经更近了,因为它正在运行

coef_list <- list()
curve_list <- list()
for(i in levels(d$ALGAE)) {
for(j in levels(d$NUTRIENT)) {
dat = d[d$ALGAE == i & d$NUTRIENT == j,]

#int <- coef(lm(DAY~carbon,data=dat))

mod <- nls(carbonlog~phi1/(1+exp(-(phi2+phi3*DAY))),
           start=list(
             phi1=9.364,
             phi2=0,
             phi3= 0.135113),
           data=dat,trace=TRUE)
coef_list[[paste(i, j, sep = "_")]] = coef(mod)

plt <- ggplot(data = dat, aes(x = DAY, y = carbonlog)) + geom_point()+
  geom_line( aes(DAY,predict(mod) ))+
  ggtitle(paste(i,"RATIO",j,sep=" ")) + 
  theme.plot
curve_list[[paste(i, j, sep = "_")]] = plt
  }
}
遗憾的是,参数是静态的,并不依赖于各自的因子组合。我估计这封信会更有帮助找到合适的人选。

如果我申请

curve_list[["ANK_1"]]

我收到了一条错误消息:

Error: Aesthetics must be either length 1 or the same as the data (17): x, y

当我使用对数转换的碳值时,我只收到消息。当我以原始格式使用碳时,它会绘制所有内容