在Rstudio

时间:2018-04-03 13:52:38

标签: r dataframe dataset bigdata combinations

我暴露了我的问题。我有这3个数据帧:

single

     Mat         Price
     A           1029.90
     B            568.52
     C            497.12
     D            573.50
     E            217.92

double

         Mat1  Mat2        Price
         A      C         1529.61
         A      D         1623.49
         A      E         1325.86
         B      C         1050.64
         B      D         1146.65
         B      E         849.02
         C      D         999.42
         C      E         700.03
         D      E         776.41
triple
            Mat1  Mat2   Mat3     Price
             B      C      D     1564.98
             B      C      E     1267.30
             C      D      E     1350.45
             B      D      E     1202.33

使用这3个数据帧,我必须构建另一个数据帧(或列表),它给出了包含从A到E的一次且仅包含字母的所有可能组合。 例如,一些可能的组合可以是:A | B | C | D | E |取自数据框single,或取自数据框double和B |的AC D | E取自数据框single,或取自数据框triple的BCD和取自数据框double的e AE等等,用于所有组合。

此外,对于发现的每个组合,我都希望将数据框的一列专用于计算的价格:

(材料的价格总和)+ 500 *(组合中使用的分组数)。

采用前面的例子:组合A | B | C | D | E |在数据框single中获取所有内容的价格等于 (1029.90 + 568.52 + 497.12 + 573.50 + 217.92)+ 500 *(5)

组合AC | B | D | E的价格等于 (1529.61 + 568.52 + 573.50 + 217.92)+ 500 *(4)

AE | BCD组合的价格等于 (1325.86 + 1564.98)+ 500 *(2)

是否可以在R中创建一个自动执行此过程的函数?非常感谢

1 个答案:

答案 0 :(得分:0)

这是我可能过于复杂的答案。不知道您的不同数据帧实际上有多大差异,我这样做是为了能够处理doubletriple中可能出现的所有其他组合,但可能会有更简化的方法我没有想到的。我认为困难的部分是使用crossing然后进行一些不同的filterdistinct调用,将其归结为您想要的29种组合。之后,生成正确的数字只需gathersummarise。我left_join回到原件上,这样你就可以看到每个结果的内容。如果您知道这29种组合将始终相同且硬编码,则更简单。

library(tidyverse)
single <- read_table2("Mat         Price
A           1029.90
B            568.52
C            497.12
D            573.50
E            217.92")
double <- read_table2("Mat1  Mat2        Price
A      C         1529.61
A      D         1623.49
A      E         1325.86
B      C         1050.64
B      D         1146.65
B      E         849.02
C      D         999.42
C      E         700.03
D      E         776.41")
triple <- read_table2("Mat1  Mat2   Mat3     Price
B      C      D     1564.98
B      C      E     1267.30
C      D      E     1350.45
B      D      E     1202.33")

# Create vectors of letter groups
ones <- single$Mat
twos <- str_c(double$Mat1, double$Mat2)
threes <- str_c(triple$Mat1, triple$Mat2, triple$Mat3)

# Create vector of permutations of ABCDE
options <- ones %>%
  gtools::permutations(5, 5, .) %>%
  as_tibble() %>%
  unite("option", V1:V5, sep = "") %>%
  `[[`(1)

# Create table with prices to join combinations onto
lookup <- bind_rows(
 single %>% mutate(rowcomb = Mat),
 double %>% unite(rowcomb, Mat1:Mat2, remove = FALSE, sep = ""),
 triple %>% unite(rowcomb, Mat1:Mat3, remove = FALSE, sep = "")
) %>%
  select(rowcomb, price = Price)

# Get all possible combinations of letter groups that could make five letters
combinations <- bind_rows(
  crossing(ones, ones, ones, ones, ones),
  crossing(ones, ones, ones, twos),
  crossing(ones, ones, threes),
  crossing(ones, twos, twos),
  crossing(twos, threes)
) %>%
  mutate_all(~ replace_na(., "")) %>%
  unite("string", ones:twos1, sep = "", remove = FALSE) %>%
  filter(string %in% options) %>% # Remove any that have repeated letters
  # Add column with the number of elements in each combination
  bind_cols(groupings = pmap_int(.[, -1], function(...) sum(c(...) != ""))) %>%
  bind_cols( # Add column that lets us remove different permutations of the same element combinations
    row = .[, 2:9] %>%
      pmap(function(...) c(...)) %>%
      map_chr(~str_c(str_sort(.), collapse = "_"))
  ) %>%
  distinct(row, .keep_all = TRUE) %>%
  rowid_to_column(var = "comb_id") # 29 final combinations

total_price <- combinations %>%
  gather("table", "letter", ones:twos1) %>%
  left_join(lookup, by = c("letter" = "rowcomb")) %>%
  group_by(comb_id) %>%
  summarise(total_price = sum(price, na.rm = TRUE))

output <- left_join(combinations, total_price, by = "comb_id") %>%
  mutate(end_price = total_price + 500 * groupings) %>%
  select(comb_id, row, groupings, end_price)

output
#> # A tibble: 29 x 4
#>    comb_id row          groupings end_price
#>      <int> <chr>            <int>     <dbl>
#>  1       1 ___A_B_C_D_E         5     5387.
#>  2       2 ____A_B_C_DE         4     4872.
#>  3       3 ____A_B_CE_D         4     4872.
#>  4       4 ____A_B_CD_E         4     4816.
#>  5       5 ____A_BE_C_D         4     4950.
#>  6       6 ____A_BD_C_E         4     4892.
#>  7       7 ____A_BC_D_E         4     4872.
#>  8       8 ____AE_B_C_D         4     4965.
#>  9       9 ____AD_B_C_E         4     4907.
#> 10      10 ____AC_B_D_E         4     4890.
#> # ... with 19 more rows

reprex package(v0.2.0)创建于2018-04-09。