在有限的范围内很容易得到卡方分布:
library(MASS)
nnn <- 1000
set.seed(101)
chii <- rchisq(nnn,4, ncp = 0) ## Generating a chi-sq distribution
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),method="BFGS") ## Fitting
chi_k <- chi_df[[1]][1] ## Degrees of freedom
chi_hist <- hist(chii,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=chi_k),add=TRUE,col="green",lwd=3) ## Plotting the line
但是,假设我有一个数据集,其中分布在X轴上展开,而其新值则由以下内容给出:
chii <- 5*rchisq(nnn,4, ncp = 0)
对于实际数据集,如果不知道此乘法因子 5
,如何将 rchisq()
/复杂数据规范化为 fitdistr()
是否合适?
提前感谢您的帮助!
答案 0 :(得分:1)
您必须循环跨越自由度才能找到最适合您的数据。首先,您可能知道卡方分布的平均值是自由度,让我们用它来调整数据并解决问题。
总之,您可以跨越可能的自由度,找到最适合您调整数据的自由度。
library(MASS)
nnn <- 1000
set.seed(101)
x <- round(runif(1,1,100)) # generate a random multiplier
chii <- x*rchisq(nnn,4, ncp = 0) ## Generating a shifted chi-sq distribution
max_df <- 100 # max degree of freedom to test (here from 1 to 100)
chi_df_disp <- rep(NA,max_df)
# loop across degree of freedom
for (i in 1:max_df) {
chii_adjusted <- (chii/mean(chii))*i # Adjust the chi-sq distribution so that the mean matches the tested degree of freedom
chi_fit <- fitdistr(chii_adjusted,"chi-squared",start=list(df=i),method="BFGS") ## Fitting
chi_df_disp[i] <- chi_fit$estimate/i # This is going to give you the dispersion between the fitted df and the tested df
}
# Find the value with the smallest dispersion (i.e. the best match between the estimated df and the tested df)
real_df <- which.min(abs(chi_df_disp-1))
print(real_df) # print the real degree of freedom after correction
现在,您可以使用“真实”自由度来调整卡方分布并绘制理论分布线。
chii_adjusted <- (chii/mean(chii))*real_df
chi_hist <- hist(chii_adjusted,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=real_df),add=TRUE,col="green",lwd=3) ## Plotting the line