嗨,我是R的新手,想问更一般的问题。如何模拟或创建适合在此处发布并同时具有可重复性的示例数据集。例如,我想创建一个数值示例,它可以正确地抽象我的数据集。一个条件是在我的依赖变量和自变量之间实现一些相关性。
例如。如何在我的点数与我的in.var1
和in.var2
之间引入一些相关性?
set.seed(1122)
count<-rpois(1000,30)
in.var1<- rnorm(1000, mean = 25, sd = 3)
in.var1<- rnorm(1000, mean = 12, sd = 2)
data<-cbind(count,in.var1,in.var2)
答案 0 :(得分:3)
您可以通过添加&#34;信息的某些部分来引入依赖性。在两个变量中构造count变量:
set.seed(1222)
in.var1<- rnorm(1000, mean = 25, sd = 3)
#Corrected spelling of in.var2
in.var2<- rnorm(1000, mean = 12, sd = 2)
count<-rpois(1000,30) + 0.15*in.var1 + 0.3*in.var2
# Avoid use 'data` as an object name
dat<-data.frame(count,in.var1,in.var2)
> spearman(count, in.var1)
rho
0.06859676
> spearman(count, in.var2)
rho
0.1276568
> spearman(in.var1, in.var2)
rho
-0.02175273
> summary( glm(count ~ in.var1 + in.var2, data=dat) )
Call:
glm(formula = count ~ in.var1 + in.var2, data = dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-16.6816 -3.6910 -0.4238 3.4435 15.5326
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.05034 1.74084 16.688 < 2e-16 ***
in.var1 0.14701 0.05613 2.619 0.00895 **
in.var2 0.35512 0.08228 4.316 1.74e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
答案 1 :(得分:1)
如果您希望count
成为in.var1
和invar.2
的函数,请尝试此操作。请注意,count
已经是函数名称,因此我将其更改为Count
set.seed(1122)
in.var1<- rnorm(1000, mean = 4, sd = 3)
in.var2<- rnorm(1000, mean = 6, sd = 2)
Count<-rpois(1000, exp(3+ 0.5*in.var1 - 0.25*in.var2))
Data<-data.frame(Count=Count, Var1=in.var1, Var2=in.var2)
您现在拥有基于in.var1
和in.var2
的泊松计数。泊松回归将显示截距3和Var1
的系数为0.5,Var2
的系数为-0.25
summary(glm(Count~Var1+Var2,data=Data, family=poisson))
Call:
glm(formula = Count ~ Var1 + Var2, family = poisson, data = Data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.84702 -0.76292 -0.04463 0.67525 2.79537
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.001390 0.011782 254.7 <2e-16 ***
Var1 0.499789 0.001004 498.0 <2e-16 ***
Var2 -0.250949 0.001443 -173.9 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 308190.7 on 999 degrees of freedom
Residual deviance: 1063.3 on 997 degrees of freedom
AIC: 6319.2
Number of Fisher Scoring iterations: 4
答案 2 :(得分:0)
据我了解,您希望为数据添加一些模式。
# Basic info taken from Data Science Exploratory Analysis Course
# http://datasciencespecialization.github.io/courses/04_ExploratoryAnalysis/
set.seed(1122)
rowNumber = 1000
count<-rpois(rowNumber,30)
in.var1<- rnorm(rowNumber, mean = 25, sd = 3)
in.var2<- rnorm(rowNumber, mean = 12, sd = 2)
data<-cbind(count,in.var1,in.var2)
dataNew <- data
for (i in 1:rowNumber) {
# flip a coin
coinFlip <- rbinom(1, size = 1, prob = 0.5)
# if coin is heads add a common pattern to that row
if (coinFlip) {
dataNew[i,"count"] <- 2 * data[i,"in.var1"] + 10* data[i,"in.var2"]
}
}
基本上,我将一个模式count = 2 * in.var1 + 10 * in.var2添加到一些随机行,这里是coinFlip变量。当然你应该将它矢量化为更多行。