Question

这是我的代码：

#data
sites <- 
  structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L), 
                 date = c(13493L, 13534L, 13566L, 13611L, 13723L,
                          13752L, 13804L, 13837L, 13927L, 14028L,
                          14082L, 14122L, 14150L, 14182L, 14199L,
                          16198L, 16279L, 16607L, 16945L, 17545L,
                          17650L, 17743L, 17868L, 17941L, 18017L, 18092L),
                 y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 
                       11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L,
                       6L, 1L, 0L, 5L, 0L)), 
            .Names = c("site", "date", "y"),
            class = "data.frame", row.names = c(NA, -26L))

#convert to date
x<-as.Date(sites$date, origin="1960-01-01") 

#plot smooth, line goes below zero!
qplot(data=sites, x, y, main="Site 349") 
(p <- qplot(data = sites, x, y, xlab = "", ylab = ""))
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5))

一些LOESS行和置信区间低于零，我想将图形限制为0和正数（因为负数没有意义）。 enter image description here

我该怎么做？

Answer 1

我支持马特帕克的建议，你必须改变拟合程序。一个通常适用于仅正数据的选项是在对数刻度上进行拟合，然后取幂以获得原始比例的结果。这将保证只有正值。

生成具有以下某些问题的随机数据：

 d <- data.frame(x=0:100)
 d$y <- exp(rnorm(nrow(d), mean=-d$x/40, sd=0.8))
 qplot(x,y,data=d) + stat_smooth()

现在我们可以使用ggplot的转换功能来对y值进行对数转换，但是以指数比例显示结果（对应于原始值）：

qplot(x,y,data=d) + stat_smooth() + scale_y_log10()+coord_trans(ytrans="pow10")

您可以在coord_trans帮助页面上看到这样的示例。如果您不喜欢y轴，则可以操纵断点和标签。

根据问题更新进行编辑

自问题最初被问及ggplot2以来，有一些变化，原始答案没有涉及0。

选项1

解决方案的主要思想是相同的：找到一个转换，将可能值的范围映射到-Inf到Inf，黄土在那里平滑，然后反转变换结果。如果没有零，则对数转换会很好。如果包含0，我认为不存在所需的函数，但通常有效的可能是log(1+x)转换。这是内置的，但我们也需要进行逆变换exp(x)-1。

library(scales)
#create exp(x)-1 transformation, the inverse of log(1+p)
expm1_trans <-  function() trans_new("expm1", "expm1", "log1p")

qplot(x, y, data=sites) + stat_smooth(method="loess") +
  scale_y_continuous(trans=log1p_trans()) +
  coord_trans(ytrans=expm1_trans())

Loess fit on log(1+x) transformed data

选项2

第二个选项将评论中的建议扩展到Matt Parker的答案：使用包含结果整数性质的回归方法。这意味着过度分散（以防万一）Poisson回归计数。虽然你不能做黄土，但你可以做一个花键拟合。您可以使用自由度来控制平滑度。

library(splines)
qplot(x, y, data=sites) + stat_smooth(method="glm", family="quasipoisson", 
                                      formula = y ~ ns(x, 3))

Spline fit using overdispersed Poisson regression

这两个选项给出了非常相似的结果，这是一件好事。

Answer 2

如果没有一些示例数据，我无法测试，但

qplot(data=sites, x, y, main="Site 349")  
(p <- qplot(data = sites, x, y, xlab = "", ylab = "")) 
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5)) 
p1 + theme_bw() + opts(title = "Site 349") + ylim(0, foo)

（其中foo是你的情节的合适上限）可能会成功。与基本图形不同，ggplot中的xlim（）和ylim（）命令实际上限制了用于制作绘图的数据，而不仅仅是绘图窗口。它也可能会限制geom_smooth()（虽然我不确定）。

编辑：在阅读了一下之后，您可能还想考虑切换geom_smooth正在使用的模型。同样，无法查看您的数据是一个问题。但是，例如，如果它是二进制 - 您可以添加stat_smooth(method="glm", family="binomial")以获得一个logit平滑的行。见?stat_smooth 更多。

在ggplot中，在LOESS中将y限制为> 0

2 个答案: