Question

我有一组数据，我需要对其进行采样。部分数据如下：

row.names  customer_ID
1           10000000
2           10000000
3           10000000    
4           10000000
5           10000005
6           10000005
7           10000008
8           10000008
9           10000008
10          10000008
11          10000008
12          10000008
...

从每个客户的前两行开始，然后包括下一行进行检查：我们有65％的机会采取下一行，有35％的机会我们退出并转移到下一位客户。如果我们采取行，我们再次进行65％和35％，直到我们为客户的数据用完为止，或者我们未通过检查并转移到下一个客户。为每个客户重复此操作

Answer 1

确定从客户获取的行数的过程基本上是负二项分布。假设您的数据存储在dat：

中

# Split your data by customer id
spl <- split(dat, dat$customer_ID)

# Grab the correct number of rows from each customer
set.seed(144)
spl <- lapply(spl, function(x) x[seq(min(nrow(x), 2+rnbinom(1, 1, 0.35))),])

# Combine into a final data frame
do.call(rbind, spl)
#            row.names customer_ID
# 10000000.1         1    10000000
# 10000000.2         2    10000000
# 10000000.3         3    10000000
# 10000000.4         4    10000000
# 10000005.5         5    10000005
# 10000005.6         6    10000005
# 10000008.7         7    10000008
# 10000008.8         8    10000008
# 10000008.9         9    10000008

如何从R中的数据中提取某些行？

1 个答案: