假设我有一个像这样的df
df1 <- data.frame(n =c("n1", "n2", "n3", "n4", "n5", "n6", "n7", "n8", "n9", "n10", "n11", "n12", "n13", "n14", "n15", "n16", "n17", "n18"), Cond1 =c("I1", "I2", "I3", "I4", "I5", "I6", "I1", "I2", "I3", "I4", "I5", "I6", "I1", "I2", "I3", "I4", "I5", "I6"), Cond2 =c("c1", "c1","c1","c1","c1","c1","c2", "c2","c2","c2","c2","c2","c3","c3","c3","c3","c3","c3"))
df1
我按行采样
df2 <- df1[sample(nrow(df1)),]
df2
我想设置采样条件,以便例如在Cond2列“ c1”内有一个列表行的间隙,然后在下一行再次出现。
因此,我希望对行进行随机排序,但要访问列的值,并进行排序,以便在新df的上一行中,如果cond2中包含“ c1”,则下一行中不得包含“ c1” ,但为“ c2”或“ c3”。
答案 0 :(得分:2)
例如,您可以取样是df1
的两倍。然后利用Cond2
列中的数字设置一个差异,并删除所有差异为0
的行。最后将数据帧缩小到df1
的长度。
df2 <- df1[sample(nrow(df1), nrow(df1)*2, replace=TRUE), ]
df2$tmp <- diff(c(0, as.numeric(gsub("\\D", "", df2$Cond2))))
df2[df2$tmp != 0, -4][1:nrow(df1), ]
# n Cond1 Cond2
# 2 n2 I2 c1
# 8 n8 I2 c2
# 4 n4 I4 c1
# 12 n12 I6 c2
# 3.1 n3 I3 c1
# 13 n13 I1 c3
# 11 n11 I5 c2
# 5 n5 I5 c1
# 11.1 n11 I5 c2
# 14 n14 I2 c3
# 1 n1 I1 c1
# 18 n18 I6 c3
# 3.2 n3 I3 c1
# 8.1 n8 I2 c2
# 13.2 n13 I1 c3
# 10.1 n10 I4 c2
# 15 n15 I3 c3
# 1.1 n1 I1 c1
要使解决方案适用于多列,可以使用while
循环,因为这是一个迭代过程,长度未知,直到所有差异都为0
。
set.seed(42) # for sake of reproducibility
df2 <- df1[sample(nrow(df1), nrow(df1)*2, replace=TRUE), ]
df2$tmp1 <- diff(c(0, as.numeric(gsub("\\D", "", df2$Cond1))))
df2$tmp2 <- diff(c(0, as.numeric(gsub("\\D", "", df2$Cond2))))
while (any(df2[4:5] == 0)) {
df2 <- df2[df2$tmp1 != 0, ]
df2 <- df2[df2$tmp2 != 0, ]
df2$tmp1 <- diff(c(0, as.numeric(gsub("\\D", "", df2$Cond1))))
df2$tmp2 <- diff(c(0, as.numeric(gsub("\\D", "", df2$Cond2))))
}
df2
# n Cond1 Cond2 tmp1 tmp2
# 17 n17 I5 c3 5 3
# 6 n6 I6 c1 1 -2
# 15 n15 I3 c3 -3 2
# 12 n12 I6 c2 3 -1
# 14 n14 I2 c3 -4 1
# 3 n3 I3 c1 1 -2
# 12.1 n12 I6 c2 3 1
# 13 n13 I1 c3 -5 1
# 9 n9 I3 c2 2 -1
# 13.1 n13 I1 c3 -2 1
# 9.1 n9 I3 c2 2 -1
# 17.3 n17 I5 c3 2 1
# 3.1 n3 I3 c1 -2 -2
# 18.1 n18 I6 c3 3 2
# 2 n2 I2 c1 -4 -2
# 10.1 n10 I4 c2 2 1
# 17.5 n17 I5 c3 1 1
# 9.3 n9 I3 c2 -2 -1
# 16 n16 I4 c3 1 1
# 7 n7 I1 c2 -3 -1
# 15.2 n15 I3 c3 2 1