我有一个包含10种名称的物种列。我必须随机将物种分成四列,这样每列都会占据一定比例的物种。
假设第一列占20%,第二列占30%,第三列占40%,后续占10%。这四列将是四种不同的环境,即:
Restricted, Tidalflat, beach, estuary
因此,将预先确定列入口,但选择将是随机的。
我的输入数据如下所示:
species <- c('Natica','Tellina','Mactra','Natica','Arca','Arca','Tellina',
'Nassarius','Cardium','Cardium')
结果应如下所示:
答案 0 :(得分:3)
一些简单的设置:
species <- c('Natica','Tellina','Mactra','Natica','Arca','Arca','Tellina',
'Nassarius','Cardium','Cardium')
rspecies <- sample(species)
envirs <- c('Restricted', 'Tidalflat', 'Beach', 'Estuary')
probs <- c(.2, .3, .4, .1)
nrs <- round(length(species) * probs)
现在,具有单独列的data.frame不是表达数据的好方法,因为您的数据不是矩形,即每列中没有相同数量的观察值。
您可以以长格式显示数据:
df <- data.frame(species = rspecies, envir = rep(envirs, nrs), stringsAsFactors = FALSE)
species envir 1 Tellina Restricted 2 Natica Restricted 3 Arca Tidalflat 4 Mactra Tidalflat 5 Tellina Tidalflat 6 Arca Beach 7 Nassarius Beach 8 Cardium Beach 9 Cardium Beach 10 Natica Estuary
或者作为清单:
split(rspecies, df$envir)
$Beach [1] "Mactra" "Natica" "Arca" "Arca" $Estuary [1] "Tellina" $Restricted [1] "Nassarius" "Cardium" $Tidalflat [1] "Cardium" "Natica" "Tellina"
适应不同数量物种的一种方法是根据环境使分配具有概率。实际数据集越大,这将越好。
species2 <- c('Natica','Tellina','Mactra','Natica','Arca','Arca','Tellina',
'Nassarius','Cardium','Cardium', 'Cardium')
length(species2)
[1] 11
grps <- sample(envirs, size = length(species2), prob = probs, replace = TRUE)
df2 <- data.frame(species = species2, envir = grps, stringsAsFactors = FALSE)
df2 <- df2[order(df2$envir), ]
species envir 5 Arca Beach 10 Cardium Beach 1 Natica Estuary 11 Cardium Estuary 3 Mactra Restricted 7 Tellina Restricted 2 Tellina Tidalflat 4 Natica Tidalflat 6 Arca Tidalflat 8 Nassarius Tidalflat 9 Cardium Tidalflat
答案 1 :(得分:1)
也许不在一行代码中。我不理解列部分,但您可以使用下面的内容来创建数据框,但您的列长度不相等。
species <- 1:1000
ranspecies <- sample(species)
first20 <- ranspecies[1:(floor(length(species)*.20))]
second30 <- ranspecies[(floor(length(species)*.20)+1):(floor(length(species)*.50))]
third40 <- ranspecies[(floor(length(species)*.50)+1):(floor(length(species)*.90))]
forth10 <- ranspecies[(floor(length(species)*.90)+1):length(species)]
或匹配您的示例
species <- c('Natica'
,'Tellina'
,'Mactra'
,'Natica'
,'Arca'
,'Arca'
,'Tellina'
,'Nassarius'
,'Cardium'
,'Cardium')
ranspecies <- sample(species)
first20 <- ranspecies[1:(floor(length(species)*.20))]
second30 <- ranspecies[(floor(length(species)*.20)+1):(floor(length(species)*.50))]
third40 <- ranspecies[(floor(length(species)*.50)+1):(floor(length(species)*.90))]
forth10 <- ranspecies[(floor(length(species)*.90)+1):length(species)]
dflength <- max(length(first20), length(second30), length(third40),length(forth10))
data.frame(f = c(first20,rep(NA,dflength-length(first20)))
,s = c(second30,rep(NA,dflength-length(second30)))
,t = c(third40,rep(NA,dflength-length(third40)))
,f = c(forth10,rep(NA,dflength-length(forth10)))
)
尽管我觉得有些步骤可以更加紧凑。但我会让你更多地摆弄它。