如何从直方图中拟合R中的幂律分布

时间:2016-05-11 15:03:47

标签: r package data-fitting power-law

我已经实现了将数据拟合到幂律分布中的方法,该分布在“Clauset等人的经验数据中的幂律分布”一文中进行了解释

然后,您的代码运行良好,并将实现的示例数据“moby”用作输入。输入数据格式必须是数字列表。所有这些代码都基于poweRlaw包,该包被广泛记录

library("poweRlaw")

##########################################################
# Data 
data("moby")
moby -> data

# Parameters
xmax <- 1000
simulations <- 100

##########################################################
# Setting data xmax
data_xmax = subset(data, data<xmax) 

# Fitting data to a power law
fit_object = displ$new(data_xmax)


# 1. Estimate parameters xmin and alpha
est = estimate_xmin(fit_object)
fit_object$setXmin(est)

xmin <- est$xmin
alpha <- est$pars

# bootstrap procedure por getting parameter uncertenity
bs = bootstrap(fit_object, no_of_sims = simulaciones, threads=6)
xmin_sd <- sd(bs$bootstraps[, 2])
alpha_sd <- sd(bs$bootstraps[, 3])


# 2. Goodness-of-fit getting p-value. If p-value is greater than 0.1, power law is a plausible hypotesis
bs_p = bootstrap_p(fit_object, no_of_sims = simulaciones, threads = 6)
p_value <- bs_p$p


# 3. Compare power law with alternative hypotesis via likelihood ratio test
#Not implemented 

# 4. Print results
cat("xmin =", xmin, "+/-", xmin_sd, "\n", file="outfile.txt")
cat("alpha =", alpha, "+/-", alpha_sd, "\n", file="outfile.txt", append=TRUE)
cat("p_value =", p_value, "\n", file="outfile.txt", append=TRUE)
cat("p_value>0.1 indica que es plausible la hipotesis power-law", "\n", file="outfile.txt", append=TRUE)

# This last point is just fitting it with linear regresion (not in paper)
# 5. Histogram and linear correlation
nbreaks <- 40
breaks <- exp(seq(log(min(data_xmax)), log(max(data_xmax)), len=nbreaks))
hh <- hist(data_xmax, breaks, plot=FALSE)
density <- subset(hh$density, hh$density>0)
mids <- subset(hh$mids, hh$density>0)
lm.out = lm(log(mids) ~ log(density))
s <- summary(lm.out)
capture.output(s, file = "outfile.txt", append=TRUE)

所以,当我说输入数据是像[1,1,1,2,2,1,1,2,1,1,5,4,1,6]这样的列表时,这段代码效果很好。 8 ...]

但在我的情况下,我将输入数据作为来自数据直方图的两个向量。这个直方图遵循幂律,我想使用previouse方法来解释这个假设。 从直方图中采样数据并使用前面的代码不是一个选项,因为我的矢量非常大。

如何使用拟合幂律分布,如前面的代码,但直方图作为输入?

我的输入数据样本,其中第一列是值,而第一列是频率:

0.000000000000031 38439456739397591040
0.000000000000062 14767296825218224128
0.000000000000093 6079734003177269248
0.000000000000124 2971231445610211328
0.000000000000155 1738434366193109248
0.000000000000186 1154218311877587456
0.000000000000217 818920970321569920
0.000000000000248 622764128957547776
0.000000000000279 480207160611660480
0.000000000000310 379614735061691200
0.000000000000341 310479601192840192
0.000000000000372 256106188215333536
0.000000000000403 221361351706885312
0.000000000000434 184811588626569664
0.000000000000465 158736845829413248
0.000000000000496 145812282341576768
0.000000000000527 126248167535799328
0.000000000000558 113871528185851168
0.000000000000589 98884191473023328
0.000000000000621 89182711149235824
0.000000000000652 79287845835605408
0.000000000000683 74130912773127008
0.000000000000714 70811137114156528
0.000000000000745 58595651922410816
0.000000000000776 55179183768518880
0.000000000000807 53470949691572912
0.000000000000838 49603249894714112
0.000000000000869 45542165108012368
0.000000000000900 42963698576773168
0.000000000000931 39708384581083680
0.000000000000962 38741459631868976
0.000000000000993 35002683161572140
0.000000000001024 35905146447505860
0.000000000001055 32005215819006568
0.000000000001086 30941598374870400
0.000000000001117 28717670991676592
0.000000000001148 25978050302234940
0.000000000001179 24978894521379752
0.000000000001210 24205354562007992
0.000000000001241 22690505474904960
0.000000000001272 21820273020611728
0.000000000001303 21691349694049768
0.000000000001334 19886423122182328
0.000000000001365 18049265718674400
0.000000000001396 17243494927662150
0.000000000001427 16953417442897740
0.000000000001458 17791419065550480
0.000000000001489 16179877483525980
0.000000000001520 15406337524154220
0.000000000001551 14955105881187360
0.000000000001582 15309645029232750

0 个答案:

没有答案