Question

我是R编程的初学者，因此在我需要编写的代码中苦苦挣扎。

假设我有一个项目矩阵（按列），每种情况下都有一些值，例如

我想形成这些列项目的不同组合，即。两个项目的组合，三个项目的组合等等。同时，我想使用上表（上面粘贴）中的值对这些组合中的每一个进行一些计算。我已经在excel中完成了两个的组合-

但是公式对于不同的组合会发生变化，即，对于两项的组合，公式将为

（exp（item1）+ exp（item2））/（exp（item1）+ exp（item2）+ 4）

对于3个项目的组合，公式将像这样扩展

（exp（item1）+ exp（item2）+ exp（item3））/（exp（item1）+ exp（item2）+ exp（item3）+ 4）

依此类推...

我发现可以使用R包RcppAlgos中的 comboGeneral 或turfR包中的 turf.combos 来形成组合。但是，我无法弄清楚如何在一个R代码中同时进行计算以及如何使代码动态化（由于上述公式的结构不断变化）。请帮助。

Answer 1

这是一个常规解决方案。我已经使用了乔恩·斯普林（Jon Spring）提供的数据生成脚本（尽管为了说明目的，我减小了大小）。

# function to do your calculation
foo = function(x) sum(exp(x)) / sum(exp(x), 4)

# generate combinations (only base functions needed)
n_items = 5
combos = lapply(2:n_items, function(x) combn(1:n_items, x, simplify = FALSE))
combos = unlist(combos, recursive = FALSE)

# convert input to a matrix and only keep Item columns.
mat = as.matrix(df_wide[, -1])
# set up matrix to hold results
results = matrix(NA_real_, nrow = nrow(mat), ncol = length(combos))

# iterate over the combinations and use apply to calculate foo row-wise
# for each combination of columns
for (i in seq_along(combos)) {
  results[, i] = apply(mat[, combos[[i]]], MARGIN = 1, FUN = foo)
}

# name results and add them to the original data
colnames(results) = sapply(combos, paste, collapse = "_")
final_result = cbind(df_wide, results)

# see what we've got
print(final_result[, 1:27], digits = 3)
#   ID Item1 Item2 Item3 Item4  Item5   1_2   1_3   1_4   1_5   2_3   2_4   2_5   3_4
# 1  1 0.915 0.519 0.458 0.940 0.9040 0.511 0.505 0.558 0.554 0.449 0.515 0.509 0.509
# 2  2 0.937 0.737 0.719 0.978 0.1387 0.537 0.535 0.566 0.481 0.509 0.543 0.447 0.541
# 3  3 0.286 0.135 0.935 0.117 0.9889 0.382 0.492 0.380 0.501 0.480 0.362 0.489 0.479
# 4  4 0.830 0.657 0.255 0.475 0.9467 0.514 0.473 0.494 0.549 0.446 0.469 0.530 0.420
# 5  5 0.642 0.705 0.462 0.560 0.0824 0.495 0.466 0.477 0.427 0.474 0.486 0.437 0.455
#     3_5   4_5 1_2_3 1_2_4 1_2_5 1_3_4 1_3_5 1_4_5 2_3_4 2_3_5 2_4_5 3_4_5 1_2_3_4
# 1 0.503 0.557 0.590 0.627 0.624 0.624 0.621 0.653 0.593 0.589 0.627 0.623   0.675
# 2 0.445 0.488 0.626 0.646 0.591 0.645 0.590 0.614 0.630 0.569 0.596 0.594   0.700
# 3 0.567 0.488 0.557 0.474 0.563 0.556 0.621 0.563 0.546 0.615 0.553 0.614   0.606
# 4 0.492 0.511 0.580 0.593 0.630 0.565 0.606 0.618 0.547 0.592 0.605 0.578   0.640
# 5 0.401 0.415 0.579 0.587 0.556 0.567 0.533 0.542 0.573 0.540 0.549 0.525   0.645
## ...

使用此示例数据：

library(dplyr); library(tidyr)
set.seed(42)
df <- data_frame(ID   = rep(1:5, 5),
                   Item = rep(paste0('Item', 1:5), each = 5),
                   value = runif(25))
df

df_wide <- df %>% spread(Item, value)

Answer 2

我认为n物品的通用解决方案超出了我的范围，但应该可行。

首先，一些虚假数据：

#   (BTW, it would be more helpful to provide this as text in your question.)
library(dplyr); library(tidyr)
set.seed(42)
df <- data_frame(ID   = rep(1:100, 5),
                   Item = rep(1:5, each = 100),
                   value = runif(500))
df

# I've made it in "long" format, but we can show in wide format like this
df_wide <- df %>% spread(Item, value)
df_wide

这是两个项目的所有组合的解决方案：

output_tbl <- df %>%
  group_by(ID) %>%
  crossing(.$Item, .$Item) %>%
  ungroup() %>%
  select(-Item, -value) %>%
  left_join(df, by = c("ID" = "ID", 
                         '.$Item' = "Item")) %>%
  left_join(df, by = c("ID" = "ID", 
                         ".$Item1" = "Item")) %>%
  mutate(output = (exp(value.x) + exp(value.y)) / (exp(value.x) + exp(value.y) + 4))

这是三个项目的所有组合的解决方案：

output_tbl <- df %>%
  group_by(ID) %>%
  crossing(.$Item, .$Item, .$Item) %>%
  ungroup() %>%
  select(-Item, -value) %>%
  left_join(df, by = c("ID" = "ID", 
                       '.$Item' = "Item")) %>%
  left_join(df, by = c("ID" = "ID", 
                       ".$Item1" = "Item")) %>%
  left_join(df, by = c("ID" = "ID", 
                       ".$Item2" = "Item")) %>%
  mutate(output = (exp(value.x) + exp(value.y) + exp(value)) / 
           (exp(value.x) + exp(value.y) + exp(value) + 4))

创建项目的所有可能组合并同时进行计算

2 个答案: