R:从一个类别中随机抽样最少数量的观测值

时间:2015-09-14 12:39:11

标签: r random gps categories

我有来自GPS项圈的位置数据,我正在尝试根据R中项圈的功能模拟不同的场景。其中一项模拟是项圈全天都错过了GPS点(由于各种原因) 。我的数据包括每天14个GPS点,我想随机选择(无需替换)最少5个点,最多可能有14个点。

在另一个模拟中,我每天使用此脚本从另一个线程(R: Random sampling an even number of observations from a range of categories)中提取5个随机点,但我并不完全理解脚本的所有不同位,这些位允许我将其更改为让它提取至少5分。任何建议最受赞赏。

dat2 <- data.table(dat.r)
dat2.ss <- dat2[ , .SD[sample(1:.N,min(5,.N))], by=DayNo]

数据帧(dat.r)的输出

dput(head(dat.r, 20))
structure(list(Latitude = c(5.4118432, 5.4118815, 5.4115713, 
5.4111541, 5.4087853, 5.4083702, 5.4082527, 5.4078161, 5.4075528, 
5.407321, 5.4070598, 5.4064237, 5.4070621, 5.4070251, 5.4070555, 
5.4065127, 5.4065134, 5.4064872, 5.4056724, 5.4038751), Longitude = c(118.0225467, 
118.0222841, 118.0211875, 118.0208637, 118.0205413, 118.0206064, 
118.0204101, 118.0209272, 118.0213827, 118.0214189, 118.0217748, 
118.0223343, 118.0227079, 118.0226511, 118.0226916, 118.0220733, 
118.02218, 118.0221843, 118.0223316, 118.0198153), DayNo = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L)), .Names = c("Latitude", "Longitude", "DayNo"), row.names = c(NA, 
20L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

这应该有效:

library(data.table)
set.seed(1)    # for reproducible example
setDT(dat.r)[,.SD[sample(.N, sample(min(5,.N):min(.N,14),1))], by=DayNo]
#     DayNo Latitude Longitude
#  1:     1 5.411881  118.0223
#  2:     1 5.411154  118.0209
#  3:     1 5.407553  118.0214
#  4:     1 5.411843  118.0225
#  5:     1 5.411571  118.0212
#  6:     1 5.407062  118.0227
#  7:     1 5.408785  118.0205
#  8:     1 5.408370  118.0206
#  9:     2 5.406513  118.0221
# 10:     2 5.407025  118.0227
# 11:     2 5.406513  118.0222
# 12:     2 5.405672  118.0223
# 13:     2 5.403875  118.0198

这个想法是sample(x, n)从向量n中获取大小为1:x的样本(其中x是数字,而不是向量)。因此,您希望n自己从5:min(.N,14)中抽样。我补充说,某一天可能会少于五个点。