我有df:
df <- data.frame(group = c(rep("G1",18), rep("G2", 10)), X = c(rep("a", 10), rep("b", 8), rep("c", 4), rep("d", 6)), Y = c(rep(1:10), rep(1:8), rep(1:4), rep(1:6)))
可能通过使用dplyr
或tidyr
,我想使每个group
内的所有子组长度相同,这应该是其中最小的一个。群组。
简而言之,结果数据框应为:
df_r <- data.frame(group = c(rep("G1",16), rep("G2", 8)), X = c(rep("a", 8), rep("b", 8), rep("c", 4), rep("d", 4)), Y = c(rep(1:8), rep(1:8), rep(1:4), rep(1:4)))
我无法专注于如何实现这一目标。任何帮助将不胜感激。
答案 0 :(得分:4)
这可能是你想要的?
style(){
var link = document.getElementsByTagName("link")[0];
console.log(link.href);
link.href = "style2.css"; // or something else
console.log(link.href);
}
答案 1 :(得分:1)
以下是使用data.table
library(data.table)
setDT(df)[, {
i1 <- tabulate(factor(X))
i2 <- sequence(pmin(i1, min(i1)))
.SD[Y %in% i2] } , by = .(group)]
# group X Y
# 1: G1 a 1
# 2: G1 a 2
# 3: G1 a 3
# 4: G1 a 4
# 5: G1 a 5
# 6: G1 a 6
# 7: G1 a 7
# 8: G1 a 8
# 9: G1 b 1
#10: G1 b 2
#11: G1 b 3
#12: G1 b 4
#13: G1 b 5
#14: G1 b 6
#15: G1 b 7
#16: G1 b 8
#17: G2 c 1
#18: G2 c 2
#19: G2 c 3
#20: G2 c 4
#21: G2 d 1
#22: G2 d 2
#23: G2 d 3
#24: G2 d 4
答案 2 :(得分:1)
我就是这样做的:
library(data.table)
setDT(df)[, size := .N, by = .(group, X)][
, size := min(size), by = group][
, head(.SD, size[1]), by = .(group, X)]
# group X Y size
# 1: G1 a 1 8
# 2: G1 a 2 8
# 3: G1 a 3 8
# 4: G1 a 4 8
# 5: G1 a 5 8
# 6: G1 a 6 8
# 7: G1 a 7 8
# 8: G1 a 8 8
# 9: G1 b 1 8
#10: G1 b 2 8
#11: G1 b 3 8
#12: G1 b 4 8
#13: G1 b 5 8
#14: G1 b 6 8
#15: G1 b 7 8
#16: G1 b 8 8
#17: G2 c 1 4
#18: G2 c 2 4
#19: G2 c 3 4
#20: G2 c 4 4
#21: G2 d 1 4
#22: G2 d 2 4
#23: G2 d 3 4
#24: G2 d 4 4
# group X Y size
答案 3 :(得分:0)
这是一个相当丑陋的基础R答案:
# get minimum numbers by group
minCntGroup <- aggregate(Y~group, data=aggregate(Y~group+X, data=df, FUN=length), FUN=min)
# sample indices of df from each group returned as a list,
# using minCntGroup to sample correct size
set.seed(1234)
mySampleVector <- unlist(sapply(unique(levels(df$X)), function(i)
sample(which(df$X == i),
size=minCntGroup[minCntGroup$group %in% df[df$X==i,"group"], "Y"])))
sapply
返回一个列表,其中包含每个X子组的采样行的索引,保持较大的组变量中的大小相同。我在unlist
中包含此列表以返回向量。
如果要将其转换为data.frame,可以使用
df_r <- df[mySampleVector,]
答案 4 :(得分:0)
在对其中一个答案的评论之后,这是变量不连续且会推广到其他数据的解决方案:
out <- df %>%
group_by(group, X) %>%
mutate(subgroup_size = n()) %>%
group_by(group) %>%
mutate(min_subgroup_size = min(subgroup_size)) %>%
group_by(group, X) %>%
filter(row_number() <= min_subgroup_size) %>%
dplyr::select(-c(subgroup_size, min_subgroup_size)) %>%
ungroup()
table(out$group, out$X)
a b c d
G1 8 8 0 0
G2 0 0 4 4
此解决方案使用3个分组步骤来获得请求的结果:
(可选)将filter(row_number() <= min_subgroup_size)
替换为sample_n(min_group_size)
,以在子组中随机选择行。