Question

attenuation = data.frame(km =      
c(0,0,0.4,0.4,0.8,0.8,1.2,1.2,1.6,1.6,2,2,2.4,2.4,2.8,2.8,3.2,3.2,3.6,3.6,4, 
4,4.4,4.4,4.8,4.8,5.2,5.2,5.6,5.6,6,6,6.4,6.4,6.8,6.8,7.2,7.2,7.6,7.6,8,8, 
11.7,11.7,13,13), edna = c(76000,20000,0,0,6000,0,0,6880,10700,0,6000,
0,0,0,0,0,0,6000,0,0,0,0,0,0,0,0,6310,0,6000,6000,0,0,0,0,0,
0,0,0,0,0,0,6000,0,0,0,0))

#This worked great for a linear regression
ggplot(attenuation, aes(x = km, y = edna)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
xlab("Distance from Cage (km)") +
ylab("eDNA concentration (gene sequence/Liter)")

但线性回归似乎并不合适（r平方= 0.09）。所以我想尝试别的东西。我尝试了其他一些不合适的回归，所以我想尝试非线性回归。

我已经研究了关于堆栈溢出的这个问题并尝试了许多不同的选项，但没有任何工作。我在下面提供的选项最有意义 - 但我想知道我的公式是否错误？或者如果需要修改开始列表？

对于背景，我试图探讨河流距离和浓度之间的关系。

#This is not working for a nonlinear regression
ggplot(attenuation, aes(x = km, y = edna))+ 
geom_point() + 
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args=list (start = 
list(a = 1,b=1), se=FALSE))

当我运行上面的nls代码时，我从r得到以下错误 stat_smooth()中的计算失败：可变长度不同（找到＆＃39;（se）＆＃39;）

Answer 1

你有两个问题。首先是一个错位的＆＃34;）＆＃34;因为se=FALSE是stat_smooth=的参数，而不是method.args=：

ggplot(attenuation, aes(x = km, y = edna))+ 
  geom_point() + 
  stat_smooth(method='nls', formula='y~a*x^b', method.args=list(start = 
     list(a=1, b=1)), se=FALSE)

但这不起作用，因为你的模型不可能适合你的数据。看看等式。当x = 0时，y将等于0.对于x大于0的值，除非b为负，否则y将增加，但是然后x = 0为Inf，因此算法无法尝试负值。由于关系递减，因此需要指定为x = 0和合理起始值定义的函数。这个参数比线性函数更适合您的数据（它也可以定义为a*(x + 1)^-1，它基本上是您的函数，其中1添加到x，因此它在x = 0处定义：

ggplot(attenuation, aes(x = km, y = edna))+ 
   geom_point() + 
   stat_smooth(method = 'nls', formula = 'y~a/(x + 1)', 
      method.args=list(start=list(a=50000)), se=FALSE)

[ One parameter[1]

我通过分割20,000到76,000之间的差异来选择50000。最终估计约为20,000。您可以通过添加第二个参数来更加锐利地弯曲曲线，但是您有太多的0值，它可能会过多，具体取决于您尝试通信的内容：

ggplot(attenuation, aes(x = km, y = edna))+ 
   geom_point() + 
   stat_smooth(method='nls', formula='y~a*(1+x)^b', method.args=list(start = 
      list(a=50000, b=-1)), se=FALSE)

Answer 2

我同意@ dcarlson的回答。你在这里得到了一个非常小的数据集（共有11个非零数据点，其中两个非常重要），所以你可能不应该过于强硬地得出任何结论。前两个点肯定很大，之后可能是一个温和的下降趋势，但除此之外你不能说太多。

如果要进行幂律拟合，则必须从原点移位零km数据点。我通过在x值上加0.1来完成它。 这是我的一个随意选择，应该仔细考虑你的结果...... （请注意，如果你加上0.1或者加1，结果会有很大差异@dcarlson做了）。我还必须输入更合理的起始值，我通过拟合对数 - 对数线性回归（lm(log(edna) ~ log(km+0.1), data=attenuation)）并提取系数（大约为4和-1.5）来做到这一点。

ggplot(attenuation, aes(x = km, y = edna))+ 
  geom_point() + 
  stat_smooth(method = 'nls', formula = 'y~a*(x+0.1)^b',
              method.args=list (start = list(a = exp(4),b=-1.5)), se=FALSE)

您还可以使用对数高斯GLM更有效地执行此操作，如下所示（您仍需要将x值从零移位）。我还添加了一些代码来消除重复点的歧义。

ggplot(attenuation, aes(x = km, y = edna))+ 
   stat_sum() + 
   geom_smooth(method="glm", formula=y~log(x+0.1),
          method.args=list(family=gaussian(link="log"),
                           start=c(4,-1.5)))+
   scale_size(breaks=c(1,2),range=c(1,3))

在ggplot中遇到非线性回归问题

2 个答案: