我有来自GPS项圈的位置数据,我正在尝试根据R中项圈的功能模拟不同的场景。其中一项模拟是项圈全天都错过了GPS点(由于各种原因) 。我的数据包括每天14个GPS点,我想随机选择(无需替换)最少5个点,最多可能有14个点。
在另一个模拟中,我每天使用此脚本从另一个线程(R: Random sampling an even number of observations from a range of categories)中提取5个随机点,但我并不完全理解脚本的所有不同位,这些位允许我将其更改为让它提取至少5分。任何建议最受赞赏。
dat2 <- data.table(dat.r)
dat2.ss <- dat2[ , .SD[sample(1:.N,min(5,.N))], by=DayNo]
数据帧(dat.r)的输出
dput(head(dat.r, 20))
structure(list(Latitude = c(5.4118432, 5.4118815, 5.4115713,
5.4111541, 5.4087853, 5.4083702, 5.4082527, 5.4078161, 5.4075528,
5.407321, 5.4070598, 5.4064237, 5.4070621, 5.4070251, 5.4070555,
5.4065127, 5.4065134, 5.4064872, 5.4056724, 5.4038751), Longitude = c(118.0225467,
118.0222841, 118.0211875, 118.0208637, 118.0205413, 118.0206064,
118.0204101, 118.0209272, 118.0213827, 118.0214189, 118.0217748,
118.0223343, 118.0227079, 118.0226511, 118.0226916, 118.0220733,
118.02218, 118.0221843, 118.0223316, 118.0198153), DayNo = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L)), .Names = c("Latitude", "Longitude", "DayNo"), row.names = c(NA,
20L), class = "data.frame")
答案 0 :(得分:1)
这应该有效:
library(data.table)
set.seed(1) # for reproducible example
setDT(dat.r)[,.SD[sample(.N, sample(min(5,.N):min(.N,14),1))], by=DayNo]
# DayNo Latitude Longitude
# 1: 1 5.411881 118.0223
# 2: 1 5.411154 118.0209
# 3: 1 5.407553 118.0214
# 4: 1 5.411843 118.0225
# 5: 1 5.411571 118.0212
# 6: 1 5.407062 118.0227
# 7: 1 5.408785 118.0205
# 8: 1 5.408370 118.0206
# 9: 2 5.406513 118.0221
# 10: 2 5.407025 118.0227
# 11: 2 5.406513 118.0222
# 12: 2 5.405672 118.0223
# 13: 2 5.403875 118.0198
这个想法是sample(x, n)
从向量n
中获取大小为1:x
的样本(其中x
是数字,而不是向量)。因此,您希望n
自己从5:min(.N,14)
中抽样。我补充说,某一天可能会少于五个点。