R:拟合具有大x范围的卡方分布

时间:2015-03-08 03:31:13

标签: r distribution curve-fitting chi-squared

在有限的范围内很容易得到卡方分布:

library(MASS)
nnn <- 1000
set.seed(101)
chii <- rchisq(nnn,4, ncp = 0) ## Generating a chi-sq distribution
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),method="BFGS") ## Fitting
chi_k <- chi_df[[1]][1] ## Degrees of freedom
chi_hist <- hist(chii,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=chi_k),add=TRUE,col="green",lwd=3) ## Plotting the line

但是,假设我有一个数据集,其中分布在X轴上展开,而其新值则由以下内容给出:
chii <- 5*rchisq(nnn,4, ncp = 0)

对于实际数据集,如果不知道此乘法因子 5 ,如何将 rchisq() /复杂数据规范化为 fitdistr()是否合适?

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

您必须循环跨越自由度才能找到最适合您的数据。首先,您可能知道卡方分布的平均值是自由度,让我们用它来调整数据并解决问题。

总之,您可以跨越可能的自由度,找到最适合您调整数据的自由度。

library(MASS)
nnn <- 1000
set.seed(101)

x <- round(runif(1,1,100)) # generate a random multiplier
chii <- x*rchisq(nnn,4, ncp = 0) ## Generating a shifted chi-sq distribution

max_df <- 100 # max degree of freedom to test (here from 1 to 100)
chi_df_disp <- rep(NA,max_df)

# loop across degree of freedom
for (i in 1:max_df) {
  chii_adjusted <- (chii/mean(chii))*i # Adjust the chi-sq distribution so that the mean matches the tested degree of freedom 
  chi_fit <- fitdistr(chii_adjusted,"chi-squared",start=list(df=i),method="BFGS") ## Fitting
  chi_df_disp[i] <- chi_fit$estimate/i # This is going to give you the dispersion between the fitted df and the tested df
}

# Find the value with the smallest dispersion (i.e. the best match between the estimated df and the tested df)
real_df <- which.min(abs(chi_df_disp-1))
print(real_df) # print the real degree of freedom after correction

现在,您可以使用“真实”自由度来调整卡方分布并绘制理论分布线。

chii_adjusted <- (chii/mean(chii))*real_df
chi_hist <- hist(chii_adjusted,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=real_df),add=TRUE,col="green",lwd=3) ## Plotting the line