我创建了一个包含2个随机生成变量的10.000个观测值的数据集。现在,我想将这10.000个观测值分为100组,并使用group_number
和group_id
生成一个变量。
到目前为止我所做的-
csize = 100 # number of clusters
n = 10000 # number of observations in each cluster
p = 2 # number of variables
# Generating matrix with 100 normally distributed values for each p
set.seed(1)
mydata = matrix(rnorm(n*p, mean=0, sd = 1), n, p)
现在,我想将这些观察分为100个聚类(每个聚类具有100个观察),然后添加两个变量:cluster_name
和group_id
。在变量cluster_name
下,我想戴上cluster_1, ..., cluster_100
,在每个聚类中,我想为观察生成group_id
。
预先感谢您的帮助。
答案 0 :(得分:1)
这可以一次完成,就像这样:
set.seed(1)
df <- data.frame(
cluster_name = rep(paste0("cluster_",1:100), each=100),
group_id = rep(1:100, each=100),
var1 = rnorm(10000),
var2 = rnorm(10000),
stringsAsFactors = FALSE
)
然后我们可以查看数据框的第一行/最后一行:
head(df)
# cluster_name group_id var1 var2
#1 cluster_1 1 -0.6264538 -0.8043316
#2 cluster_1 1 0.1836433 -1.0565257
#3 cluster_1 1 -0.8356286 -1.0353958
#4 cluster_1 1 1.5952808 -1.1855604
#5 cluster_1 1 0.3295078 -0.5004395
#6 cluster_1 1 -0.8204684 -0.5249887
tail(df)
# cluster_name group_id var1 var2
#9995 cluster_100 100 0.2096655 -0.1536432
#9996 cluster_100 100 0.9595076 1.5789764
#9997 cluster_100 100 0.4366036 -0.8131629
#9998 cluster_100 100 0.4993666 0.2795815
#9999 cluster_100 100 0.8939798 -1.2650635
#10000 cluster_100 100 0.2573871 0.5041590