我是R-newbie并创建了一个数据框,我在其中为每个产品提供了一个发明概率:
set.seed(10)
data <- data.frame(orderId=sample(c(1:10000), 100000, replace=TRUE),
product=sample(c('P1','P2','P3','P4','P5','P6','P7','P8','P9','P10', 'P11','P12','P13','P14','P15',
'P16','P17','P18','P19','P20',
'z1','z2','z3','z4','z5','z6','z7','z8','z9','z10','z11','z12','z13','z14','z15',
'z16','z17','z18','z19','z20','z21','z22','z23','z24','z25','z26','z27','z28',
'z29','z30','z31','z32','z33','z34','z35','z36','z37','z38','z39','z40')
,100000, replace=TRUE,
prob=c(0.02, 0.03, 0.01, 0.015, 0.023, 0.027, 0.009, 0.013, 0.04, 0.006,
0.018, 0.013, 0.025, 0.011, 0.003, 0.007, 0.02, 0.014, 0.01, 0.03,
0.02, 0.03, 0.01, 0.015, 0.023, 0.027, 0.009, 0.013, 0.04, 0.006,
0.018, 0.013, 0.025, 0.011, 0.003, 0.007, 0.02, 0.014, 0.01, 0.03,
0.02, 0.03, 0.01, 0.015, 0.023, 0.027, 0.009, 0.013, 0.04, 0.006,
0.018, 0.013, 0.025, 0.011, 0.003, 0.007, 0.02, 0.014, 0.01, 0.03)))
是否可以模拟它们,某些变量彼此具有相关性(例如P1, P4, P8, z1
和z3
具有高相关性)。我需要这个在R中运行因子分析吗?感谢。
答案 0 :(得分:0)
这是怎么回事:
# How much sample data
amountofsample <- 100
# create a linear var
a <- 101: eval( 100+amountofsample)
# randomly sample a multiplication factor in a narrow range (tighter the range the closer the corrleation will be)
b<- sample( seq( 1 , 1.1, .01 ), amountofsample , replace =T )
# multiply the orginal value by the random number
f <- a * b
# create a data.frame with both simulated columns
a <- data.frame( a , f )