我有一个数据框,其行按变量x
的值排序。如果x
的值存在关系(如下例中的值50和60),我需要变量group
中所有可能的值排列。我怎样才能在R
中实现这一目标?有专门的功能吗?
初始数据:
x group
45 A
50 A
50 A
50 B
52 A
60 A
60 B
70 B
88 B
期望的结果:
x group group2 group3 group4 group5 group6
45 A A A A A A
50 A A B A A B
50 A B A A B A
50 B A A B A A
52 A A A A A A
60 A A A B B B
60 B B B A A A
70 B B B B B B
88 B B B B B B
答案 0 :(得分:2)
可能是一个复杂的答案。试试这段代码
df <- read.table(text = 'x group
45 A
50 A
50 A
50 B
52 A
60 A
60 B
70 B
88 B', header = TRUE)
library(data.table)
library(gtools)
ss <- list()
setDT(df)[, {n = .N; ss <<- append(ss, list(data.frame(apply(gtools::permutations(n = n, r = n), 1, function(x) group[x])))); NULL}, by = 'x']
max_col <- max(sapply(ss, length))
ss[] <- lapply(ss, function(x) {
y <- x
while(length(y) < max_col)
y <- data.frame(y, x[, 1:min(length(x), max_col - length(y))])
names(y) <- paste0('group', 1:max_col)
y
})
tt <- do.call('rbind', ss)
tt$x <- df$x
tt
最终输出
group1 group2 group3 group4 group5 group6 x
1 A A A A A A 45
2 A A A A B B 50
3 A B A B A A 50
4 B A B A A A 50
5 A A A A A A 52
6 A B A B A B 60
7 B A B A B A 60
8 B B B B B B 70
9 B B B B B B 88
答案 1 :(得分:2)
只是另一个(更清洁)的解决方案。我们的想法是计算每个平局的所有排列,并计算组合所需的副本数量。
df <- structure(list(x = c(45L, 50L, 50L, 50L, 52L, 60L, 60L, 70L,
88L), group = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor")), .Names = c("x", "group"), class = "data.frame", row.names = c(NA,
-9L))
library(tidyverse)
library(iterpc)
ux <- unique(df$x)
m <- length(ux)
members <- ux %>% map(~ filter(df, x == .)) %>%
map(~ getall(iterpc(table(as.character(.$group)), ordered = TRUE)))
nrs <- members %>% map_int(nrow)
members <- members %>%
imap(~.x[rep(seq_len(nrow(.x)), prod(tail(nrs, m-.y)) , each = prod(head(nrs, .y-1))), , drop=FALSE])
data.frame(x = df$x, t(do.call(cbind, members)))
#> x X1 X2 X3 X4 X5 X6
#> 1 45 A A A A A A
#> 2 50 A A B A A B
#> 3 50 A B A A B A
#> 4 50 B A A B A A
#> 5 52 A A A A A A
#> 6 60 A A A B B B
#> 7 60 B B B A A A
#> 8 70 B B B B B B
#> 9 88 B B B B B B
答案 2 :(得分:1)
一个非常棘手的问题!它的核心是你需要某个版本的Heap's algorithm。有了这个,就可以使用基数R来查找具有多个x
值的group
的所有级别,对这些值进行置换,然后组合这些排列。实际上,我为不同的项目编写了这个算法的一个版本,因此将它应用于您的数据相对容易。
首先,算法:
permute.items <- function(x) {
l <- length(x);
if (l == 1) return(matrix(x, 1, 1));
sub.permute <- permute.items(x[-length(x)]);
arrangements <- rep(sub.permute, each=l);
arrangements <- matrix(arrangements, nrow(sub.permute) * l, ncol(sub.permute) + 1);
i <- rep(1:nrow(sub.permute), each=l);
j <- rep(1:l, l);
insert <- ifelse(i %% 2 == 1, l - j + 1, j);
for (xx in 1:nrow(arrangements)) {
arrangements[xx, insert[xx]] <- x[l];
counter <- 1;
for (yy in 1:l) {
if (yy != insert[xx]) {
arrangements[xx, yy] <- sub.permute[i[xx], counter];
counter <- counter + 1;
}
}
}
return(arrangements);
}
此函数接受诸如c(1, 2, 3)
或c('a', 'b', 'c')
之类的向量,并返回一个矩阵,其中每一行都是原始值的可能排列。 请注意,算法在超过10-11个元素时变得非常慢。它最初也是为输入向量永远不会有重复元素的项目设计的,所以我们必须快速删除它们。 / p>
# read in example data
df <- read.table(text = 'x group
45 A
50 A
50 A
50 B
52 A
60 A
60 B
70 B
88 B', header = T, stringsAsFactors = F)
# split the data into a list.
# each element in the list corresponds to one value of 'x', and contains its values of 'group'
x.split <- split(df$group, df$x)
# for each value of 'x', compute unique permutations and store as a matrix
x.split <- lapply(x.split, function(x) {
y <- permute.items(x)
y <- y[!duplicated(y), ]
y <- as.matrix(y)
})
# compute total number of groups we'll need
groups <- prod(unlist(sapply(x.split, function(x) dim(x)[1])))
# pre-allocate final storage
final <- matrix(NA, nrow = nrow(df), ncol = groups)
# loop through the lists' contents and glue together group permutations
for (g in 1:groups) {
final[, g] <- unlist(lapply(x.split, function(x) x[, (g %% ncol(x)) + 1]))
}
# final formatting
final <- as.data.frame(final)
final$x <- df$x
最终输出:
V1 V2 V3 V4 V5 V6 x
1 A A A A A A 45
2 A B A A B A 50
3 B A A B A A 50
4 A A B A A B 50
5 A A A A A A 52
6 B A B A B A 60
7 A B A B A B 60
8 B B B B B B 70
9 B B B B B B 88