Question

我正在参观一个鸟类保护区，那里有许多不同种类的鸟类。一些物种数量更多，而其他物种数量更少。我9次回到庇护所，每次拜访后，我都要计算观察到的物种总数。毫不奇怪，由于我每次访问都观察到最多的物种，因此访问的收益递减，但这不会增加所观察到的物种的数量。 R中最好的函数是什么，它可以预测我在第20次造访时观察到的鸟类数量？

这是data.frame

d <- structure(list(visit = 1:9, 
                    totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)), 
                    class = "data.frame", row.names = c(NA, 9L))

我希望看到一个模型，该模型能够很好地拟合数据并以类似“对数”的方式运行，并预测收益递减

Answer 1

为了最好地提出一个问题，堆栈具有一些良好的链接：https://stackoverflow.com/help/how-to-ask

如果要对此建模，我可能会基于数据对自变量的平方根进行回归。但是，将其作为访问量的函数来考虑是有点奇怪的……也许即使是间隔的时间段也更有意义。

d <- structure(list(visit = 1:9, 
                    totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)), 
               class = "data.frame", row.names = c(NA, 9L))

mod <- lm(totalNumSpeciesObserved ~ I(sqrt(visit)), d)
new.df <- data.frame(visit=1:13)
out <- predict(mod, newdata = new.df)
plot(d, type = 'o',pch = 16, xlim = c(1,13), ylim = c(200,800), lwd = 2, cex = 2)
points(out, type= 'o', pch = 21, col = "blue", cex = 2)

I()包装器使您可以即时转换自变量，无需保存新变量就可以使用sqrt()。

Answer 2

我也不知道这是否有帮助，但是您可以构建一个模拟器来测试无症状行为。例如，您可以建立一个种群：

population <- sample(size = 1e6, LETTERS[1:20], 
                     replace = TRUE, prob = 1/(2:21)^2)

这表示您的种群中有20种物种并且可能性在降低（如您所愿地扩展）。

您可以模拟访问和有关访问的信息。例如，您的访问样本有多大？在访问期间，您只会看到1％的雨林等。

sim_visits <- function(visits, percent_obs, population){
  species_viewed <- vector()
  unique_views <- vector()

  for(i in 1:visits){
    my_samp <- sample(x = population, size = round(percent_obs*length(population),0), 
                      replace = FALSE)

    species_viewed <- c(species_viewed, my_samp)

    unique_views[i] <- length(unique(species_viewed))

  }

  new_observed <- unique_views - dplyr::lag(unique_views, 1, 0)
    df <- data.frame(unique_views = unique_views, new_observed)



    df$cummulative <- cumsum(unique_views)

    df
}

然后您可以多次从模拟中提取并查看所获得的值的分布。

sim_visits(9, percent_obs = .001, population = population)
  unique_views new_observed cummulative
1           13           13          13
2           15            2          28
3           15            0          43
4           17            2          60
5           17            0          77
6           17            0          94
7           17            0         111
8           17            0         128
9           17            0         145

不知道这是否有帮助，但是我发现模拟是一种概念化此类问题的好方法。

建模收益递减的最佳功能

2 个答案: