我在试图将LOESS回归与数据集竞争时遇到了一些问题。我已经能够正确地创建该行,但我无法正确地绘制它。
我像这样浏览了这些数据。
animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5)
animals.lo
summary(animals.lo)
plot(X15p5~ Period, animals)
lines(animals$X15p5, animals.lo, col="red")
此时我收到了错误
&#34; xy.coords(x,y)出错:&#39; x&#39;并且&#39; y&#39;长度不同&#34;
我四处搜索并读到这个问题可能是由于需要订购的点,所以我继续。
a <- order(animals$Period)
lines(animals$X15p5[a], animals.lo$Period[a], col="red", lwd=3)
此时没有错误,但黄土线仍未出现在情节中。点正确显示,但不是线。
这类似于我正在使用的数据集......
structure(list(Site = c("Cat", "Dog", "Bear", "Chicken", "Cow",
"Bird", "Tiger", "Lion", "Leopard", "Wolf", "Puppy", "Kitten",
"Emu", "Ostrich", "Elephant", "Sheep", "Goat", "Fish", "Iguana",
"Monkey", "Gorilla", "Baboon", "Lemming", "Mouse", "Rat", "Hamster",
"Eagle", "Parrot", "Crow", "Dove", "Falcon", "Hawk", "Sparrow",
"Kite", "Chimpanzee", "Giraffe", "Bear", "Donkey", "Mule", "Horse",
"Zebra", "Ox", "Snake", "Cobra", "Iguana", "Lizard", "Fly", "Mosquito",
"Llama", "Butterfly", "Moth", "Worm", "Centipede", "Unicorn",
"Pegasus", "Griffin", "Ogre", "Monster", "Demon", "Witch", "Vampire",
"Mummy", "Ghoul", "Zombie"), Region = c(6L, 4L, 4L, 5L, 7L, 6L,
2L, 4L, 6L, 7L, 7L, 4L, 6L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 4L, 6L, 6L,
4L, 2L, 7L, 4L, 2L, 2L, 7L, 3L, 4L, 7L, 4L, 4L, 4L, 7L, 7L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 8L), Period = c(-2715, -3500,
-3500, -4933.333333, -2715, -2715, -2715, -3500, -2715, -4350,
-3500, -3500, -2950, -4350, -3650, -3500, -3500, -2715, -3650,
-4350, -3500, -3500, -3400, -4350, -3500, -3500, -4350, -3900,
-3808.333333, -4233.333333, -3500, -3900, -3958.333333, -3900,
-3500, -3500, -3500, -2715, -3650, -2715, -2715, -2715, -2715,
-3500, -2715, -2715, -3500, -4350, -3650, -3650, -4350, -5400,
-3500, -3958.333333, -3400, -3400, -4350, -3600, -4350, -3650,
-3500, -2715, -5400, -3500), Value = c(0.132625995, 0.163120567,
0.228840125, 0.154931973, 0.110047847, 0.054347826, 0.188679245,
0.245014245, 0.128378378, 0.021428571, 0.226277372, 0.176923077,
0.104938272, 0.17659805, 0.143798024, 0.086956522, 0.0625, 0.160714286,
0, 0.235588972, 0, 0, 0.208333333, 0.202247191, 0.364705882,
0.174757282, 0, 0.4, 0.1, 0.184027778, 0.232876712, 0.160493827,
0.74702381, 0.126984127, 0.080645161, 0.06557377, 0, 0.057692308,
0.285714286, 0.489361702, 0.108695652, 0.377777778, 0, 0.522727273,
0.024390244, 0.097560976, 0.275, 0, 0.0625, 0.255319149, 0.135135135,
0.216216216, 0.222222222, 0.296296296, 0.222222222, 0.146341463,
0.09375, 0.125, 0.041666667, 0.078947368, 0.2, 0.137931034, 0.571428571,
0.142857143), Sample_size = c(188.5, 105.75, 79.75, 70, 52.25,
46, 39.75, 39, 37, 35, 34.25, 32.5, 32.4, 30.76666667, 30.36666667,
28.75, 28, 28, 28, 26.6, 25, 25, 24, 22.25, 21.25, 20.6, 20,
20, 20, 19.2, 18.25, 18, 18, 16.8, 15.5, 15.25, 15, 13, 12.6,
11.75, 11.5, 11.25, 11, 11, 10.25, 10.25, 10, 10, 9.6, 9.4, 9.25,
9.25, 9, 9, 9, 8.2, 8, 8, 8, 7.6, 7.5, 7.25, 7, 7), Sample_sub = c(25,
17.25, 18.25, 10.8452381, 5.75, 2.5, 7.5, 9.555555556, 4.75,
0.75, 7.75, 5.75, 3.4, 5.433333333, 4.366666667, 2.5, 1.75, 4.5,
0, 6.266666667, 0, 0, 5, 4.5, 7.75, 3.6, 0, 8, 2, 3.533333333,
4.25, 2.888888889, 13.44642857, 2.133333333, 1.25, 1, 0, 0.75,
3.6, 5.75, 1.25, 4.25, 0, 5.75, 0.25, 1, 2.75, 0, 0.6, 2.4, 1.25,
2, 2, 2.666666667, 2, 1.2, 0.75, 1, 0.333333333, 0.6, 1.5, 1,
4, 1)), .Names = c("Site", "Region", "Period", "Value", "Sample_size",
"Sample_sub"), class = "data.frame", row.names = c(NA, -64L))
我已经为此工作了一段时间并尽可能多地阅读,但我还没有能够取得任何进展。任何建议或指导将不胜感激。
跟踪添加置信区间
我一直在尝试按照此页面How to get the confidence intervals for LOWESS fit using R?上网站上的另一个示例添加置信区间。
该页面上给出的示例是:
plot(cars)
plx<-predict(loess(cars$dist ~ cars$speed), se=T)
lines(cars$speed,plx$fit)
lines(cars$speed,plx$fit - qt(0.975,plx$df)*plx$se, lty=2)
lines(cars$speed,plx$fit + qt(0.975,plx$df)*plx$se, lty=2)
我改编了这个:
plot(X15p5 ~ Period, animals)
animals.lo2<-predict(loess(animals$X15p5 ~ animals$Period), se=T)
a <- order(animals$Period)
lines(animals$Period[a],animals.lo2$fit, col="red", lwd=3)
lines(animals$Period[a],animals.lo2$fit - qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
lines(animals$Period[a],animals.lo2$fit + qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
虽然这确实提供了置信区间,但回归线都是错误的。我不确定这是predict
功能或其他问题的问题。再次感谢!
答案 0 :(得分:3)
正确的代码
我四处搜索并读到这个问题可能是由于需要订购的点,所以我继续。
不,不。订购问题与您看到的错误无关。要克服错误,您需要替换
lines(animals$X15p5, animals.lo, col="red")
与
lines(animals$Period, animals.lo$fitted, col="red")
原因如下:
loess
返回一个对象列表,而不是一个向量。请参阅str(animals.lo)
或names(animals.lo)
。animals$X15p5
作为x轴?您符合您的模型:X15p5 ~ Period
,因此x轴应为Period
。关于重新排序
您需要进行排序,因为默认情况下,R按顺序排列点。以此为例:
set.seed(0); x <- runif(100, 0, 10) ## x is not in order
set.seed(1); y <- sqrt(x) ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l") ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l") ## this is nice
同样地,做:
a <- order(animals$Period)
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)
对置信区间进行跟进
试试这个:
plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
你忘了重新订购。您需要重新排序拟合值以及标准误差。
现在,dist ~ speed
数据的cars
模型无需重新排序。这是因为:
is.unsorted(cars$speed) ## FALSE
是的,数据已在那里排序。
注意我已对您的代码进行了其他两项更改:
loess
电话和predict
电话分开;也许你不需要这样做,但通常一个好习惯是将模型拟合和模型预测分开,并保留两个对象的副本。loess(animals$X15p5 ~ animals$Period)
更改为loess(X15p5 ~ Period, animals)
。在指定模型公式时使用$
符号是一个坏习惯。我在https://stackoverflow.com/a/37307270/4891738有另一个答案,显示出这种风格的缩影。您可以在那里阅读“更新”部分。我使用glm
作为示例,但对于lm
,glm
,loess
,事情是相同的。