所以我很难找到一个功能多年的工作。 该函数如下所示:
minusgompakll1=function(s,ll,n,z)
{
# Computes minus log-likelihood function under the Gomperz-Makeham model.
# Historical data are defined by three equally long vectors "ll", "n" and "z"
# where "ll" defines the age of the individuals, "n" the number of them per
# age and "z" the number of deaths.
#
# Output is minus the log-likelihood function under "s=log(theta)"
# where "theta" is the vector of Gomper-Makeham parameters.
t=exp(s)
p=exp(-t[1]-t[2]*exp(t[3]*ll))
-sum((n-z)*log(p)+z*log(1-p))
}
“s”参数是常数,“ll”是一个从18到100(年)的矢量,“n”是暴露于风险的个体数,而“z”是个数死亡。
现在,在一年内计算这个功能是没问题的。但是,我的数据集从1940年至1979年开始运行,估计每个年龄段(18-100)。因此,在每年,即1940年,我们估计年龄为18-100岁。然后在估计100岁之后,数据集继续到1941年,再次年龄为18岁。 “n”的预览:
Year Age Female Male Total
18889 1940 18 22096.83 23138.00 45234.83
18890 1940 19 22031.17 23127.33 45158.50
18891 1940 20 21978.83 23120.00 45098.83
18892 1940 21 21967.50 23119.83 45087.33
18893 1940 22 21896.17 23058.83 44955.00
18894 1940 23 21876.17 23011.00 44887.17
18895 1940 24 21774.67 22933.33 44708.00
18896 1940 25 21676.00 22804.33 44480.33
18897 1940 26 21355.83 22550.33 43906.17
18898 1940 27 21346.50 22519.00 43865.50
18899 1940 28 21367.83 22481.50 43849.33
18900 1940 29 21368.33 22490.50 43858.83
18901 1940 30 21390.50 22435.67 43826.17
18902 1940 31 21335.00 22433.00 43768.00
18903 1940 32 21378.00 22476.00 43854.00
18904 1940 33 21401.00 22516.83 43917.83
.
.
23228 1979 28 29775.00 30259.33 60034.33
23229 1979 29 30518.33 31344.00 61862.33
.
.
23299 1979 99 NA NA NA
23300 1979 100 NA NA NA
所以它是一个非常大的数据集。男性和女性的“z”参数几乎相同,只有死亡人数。我的任务是在所有这些估算中使用该函数,然后例如最终绘制它们。我尝试使用这样的for循环:
X = vector()
for(i in exposure_RISK$Year){
s=-c(8,9,2)
ll = c(18:100)[i]
n = exposure_RISK$Male[i]
z = deaths$Male[i]
X[i] = minusgompakll1(s,ll,n,z)
}
这只给我一个包含1940 - 1979年NA的数据集。我很确定问题是在“ll”中,它只有100,然后再从18岁开始。也许有多重问题?任何人都可以指出问题出在哪里,或者我如何让程序运行起来?
感谢每一个答案!
答案 0 :(得分:0)
我认为这可能是解决方案:
<强>设置强>
set.seed(1)
# Make a sample data set with risks. For simplicity, we assume there are no NAs
risk <- expand.grid(Age=18:100,Year=1940:1979)
risk$Female <- runif(nrow(risk),min=20000,max=30000)
risk$Male <- runif(nrow(risk),min=20000,max=30000)
risk$Total <- with(risk,Female+Male)
# Make a sample data set with number of deaths For simplicity, we assume there are no NAs
deaths <- expand.grid(Age=18:100,Year=1940:1979)
deaths$Female <- runif(nrow(deaths),min=1000,max=2000)
deaths$Male <- runif(nrow(deaths),min=1000,max=2000)
deaths$Total <- with(deaths,Female+Male)
# Merge risks and deaths
d <- merge(risk,deaths,by=c('Year','Age'),suffixes=c('.risk','.death'))
s <- -c(8,9,2)
实际代码
library(dplyr)
# Apply the function for the data set grouped by year
d %>%
group_by(Year) %>%
summarise(likelihood=minusgompakll1(s,Age,Male.risk,Male.death))
<强>输出强>
# Year likelihood
# 1 1940 17165333
# 2 1941 18402227
# 3 1942 16660317
# 4 1943 18288280
# 5 1944 18004813
# 6 1945 16669699
# 7 1946 16531964
# 8 1947 18000839
# 9 1948 16945135
# ...