R: how to paste a small dataframe into a larger one with specific pattern

时间:2016-08-31 16:58:59

标签: r data-manipulation

I have a data like the following:

DataA:

ID      PCODE  age COLOR POLISH ...
1     S        30  PINK   GOLD
1     S        30  PINK   GOLD
1     S        30  PINK   GOLD
2     S        20  PINK RODIUM
2     S        20  PINK RODIUM
2     S        20  PINK RODIUM
     ...

and I have another data set containing attributes of the different products available (DataB) and I want to pass those values into the non-first row for each customer.

   PCODE COLOR POLISH
     S WHITE  GOLD
     S PINK   GOLD
     S PINK   RODIUM

so basically for each customer and in each purchase occasion, I want to make the non-first row containing the attributes of products in dataB that has not been chosen by the customer. To be specific, the ideal result I want to get is

ID      PCODE  age COLOR POLISH ...
1     S        30  PINK    GOLD
1     S        30  WHITE   GOLD
1     S        30  PINK   RODIUM
2     S        20  PINK   RODIUM
2     S        20  PINK    GOLD
2     S        20  WHITE   GOLD

I have thought of using merge but it seems that this function will not give me the result I want, so I am not sure what I should use to achieve it.

1 个答案:

答案 0 :(得分:0)

使用几个步骤,我认为我们可以使用plyrdplyr包完成此操作。 请注意,您应该在plyr之前加载dplyr

library(plyr)
library(dplyr)

# select first occurrence by ID and PCODE
# assumes your data is pre-sorted

table_A %>%
  group_by(ID, PCODE) %>%
  mutate(r_n = row_number()) %>%
  filter(r_n == 1) %>%
  select(-r_n) -> table_A1

# products not chosen by customer
# select only rows where all entries are not equal for each ID and PCODE

table_B1 <- 
  ddply(table_A1, c('ID', 'PCODE'),
      function(x) 
        table_B[sapply(1:nrow(table_B), 
                       function(y) !all(x[,names(table_B)] == table_B[y,])),])

# bind together after merging

rbind.data.frame(table_A1,
                 merge(table_A1[,c('ID','PCODE','age')], 
                       table_B1, 
                       by = c('ID', 'PCODE'))) %>%
  arrange(ID, PCODE) -> table_AB

# table_AB

#      ID PCODE   age COLOR POLISH
#   <int> <chr> <int> <chr>  <chr>
# 1     1     S    30  PINK   GOLD
# 2     1     S    30 WHITE   GOLD
# 3     1     S    30  PINK RODIUM
# 4     2     S    20  PINK RODIUM
# 5     2     S    20 WHITE   GOLD
# 6     2     S    20  PINK   GOLD

数据

table_A <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), 
               PCODE = c("S", "S", "S", "S", "S", "S"), 
               age = c(30L, 30L, 30L, 20L, 20L, 20L), 
               COLOR = c("PINK", "PINK", "PINK", "PINK", "PINK", "PINK"), 
               POLISH = c("GOLD", "GOLD", "GOLD", "RODIUM", "RODIUM", "RODIUM")), 
          .Names = c("ID", "PCODE", "age", "COLOR", "POLISH"), class = "data.frame", 
          row.names = c(NA, -6L))

table_B <- structure(list(PCODE = c("S", "S", "S"), 
                          COLOR = c("WHITE", "PINK", "PINK"), 
                          POLISH = c("GOLD", "GOLD", "RODIUM")), 
                     .Names = c("PCODE", "COLOR", "POLISH"), 
                     class = "data.frame", row.names = c(NA, -3L))