我正在尝试创建一个包含n列的数据框(df1)(在本例中为3)。第1列应该是数据帧df0中的随机列。第2列应该是相同随机列的平均值加上来自df0的其他四个随机列。第3列应该是前五个加上另外五个随机列的平均值。
答案 0 :(得分:1)
我试着逐一回答你的问题。让我们从第一个开始
total <- 15 # Total number of columns in df0
sample <- 10 # Total number of columns I'm extracting from df0
values <- 4 # Number of rows
random <- sample(total,sample,replace=FALSE)
df0 <- data.frame(matrix(data = rexp(values*total, rate = total), nrow = values, ncol = total))
#At first I select 10 random columns from df0
df1 <- df0[, sample(ncol(df0), sample)]
#I would create an empty data frame
df2 <- data.frame(matrix(, nrow =values , ncol = 3))
#then assign the first column of df1 to the output ,
df2$X1 <- df1[,1]
#then you get the average of five first random selected to second column of df2
df2$X2 <- rowMeans(subset(df1[1:5]))
#finally the average of 10 columns to the third column of df2
df2$X3 <- rowMeans(subset(df1[1:10]))
> df2
# X1 X2 X3
#1 0.18816542 0.12617238 0.08728368
#2 0.09855574 0.07592763 0.06069351
#3 0.12022571 0.06045562 0.07964574
#4 0.00260806 0.06172300 0.06225859
为了删除所有不需要的列,我个人使用如下所示的内容 但我相信还有另一种方法可以做到这一点
# for example you only want to keep column 3 and 5 then
col_list = c("X3", "X5")
dfm = df0[,col_list]