Question

我对stats和R有点新意，并希望在SAS中为R找到相应的proc单变量的对数正态分布。代码是这样的，

Proc univariate data = dat;
histogram kilo / lognormal (theta=est zeta=est sigma=est noprint) 

      Midpoints 1 to 55477 by 20
Outhistogram=this;
Run;

这里输入数据是dat，为概率分布选择的变量是千。值55477是千变量的最大值。

theta，zeta和sigma的选项表示最大估计似然性

运行代码后我得到以下内容。一个表，包含以下列，1到55477×20（2774条记录）（来自sas网站的列说明）：

EXPPCT - 根据可选的拟合分布（此处为对数正态分布）确定的直方图区间中的总体估计百分比

OBSPCT - 直方图区间
中变量值的百分比
VAR - 变量名称（此处为千克）

MIDPT - 直方图间隔的中点

我使用exppct，midpt值进行进一步分析。

Answer 1

你可以尝试这样的事情。

## Sample data
set.seed(0)
dat <- rlnorm(1000, 7)

## MLE estimates
library(fitdistrplus)
pars <- coef(fitdist(dat, "lnorm"))

## table variables
breaks <- seq(1, max(dat)+100, 100)                  # histogram breaks
mids <- diff(breaks)/2 + head(breaks, -1)            # midpoints
probs <- diff(plnorm(breaks, pars[[1]], pars[[2]]))  # expected probs for each bin
obs <- table(cut(dat, breaks)) / length(dat)         # observed 

res <- data.frame(MIDPT=mids,
                  OBSPCT=as.numeric(obs)*100,
                  EXPPCT=probs*100,
                  INTERVAL=names(obs))
head(res)
#   MIDPT OBSPCT    EXPPCT  INTERVAL
# 1    51    0.5 0.8775098   (1,101]
# 2   151    3.5 3.7212573 (101,201]
# 3   251    5.9 5.4240329 (201,301]
# 4   351    6.4 6.0203732 (301,401]
# 5   451    6.8 6.0371393 (401,501]
# 6   551    5.5 5.7785383 (501,601]

## Plot
hist(dat, breaks=breaks, freq=F, col="steelblue")
points((ps <- seq(1, max(dat)+100, len=1000)),
       dlnorm(ps, pars[[1]], pars[[2]]), type="l", col="salmon", lwd=3)
legend("topright", "Expected", col="salmon", lwd=3)

SAS在R中单变量

1 个答案: