Question

我有如下数据，我需要确定数据的分布。请帮助。

update logs
inner join 'user' on
    logs.userID = user.userID
set logs.userID = user.userID
WHERE logs.log_detail LIKE concat("%",user.userID,"%");

Answer 1

一种巧妙的方法是使用fitdistrplus包来提供分发拟合的工具。以您的数据为例。

library(fitdistrplus)
descdist(x, discrete = FALSE)

现在您可以尝试适应不同的发行版。例如：

normal_dist <- fitdist(x, "norm")

abs随后检查适合度：

plot(normal_dist)

作为一个通用的观点，我建议您在Cross Validated中查看this discussion，其中主题将在长度上进行讨论。您可能也愿意看一下 Delignette-Muller和Dutang - fitdistrplus：适合分布的R包的论文，如果您对更详细的解释感兴趣，请here关于如何使用Cullen和Frey图。

Answer 2

首先，您可以做的是绘制直方图并覆盖密度

hist(x, freq = FALSE)
lines(density(x))

然后，您会看到分布是双向的，它可能是两个分布或任何其他分布的混合。

确定候选分布后，'qqplot'可以帮助您直观地比较分位数。

如何使用r识别给定数据的分布

2 个答案: