I have a data like the following:
DataA:
ID PCODE age COLOR POLISH ...
1 S 30 PINK GOLD
1 S 30 PINK GOLD
1 S 30 PINK GOLD
2 S 20 PINK RODIUM
2 S 20 PINK RODIUM
2 S 20 PINK RODIUM
...
and I have another data set containing attributes of the different products available (DataB) and I want to pass those values into the non-first row for each customer.
PCODE COLOR POLISH
S WHITE GOLD
S PINK GOLD
S PINK RODIUM
so basically for each customer and in each purchase occasion, I want to make the non-first row containing the attributes of products in dataB that has not been chosen by the customer. To be specific, the ideal result I want to get is
ID PCODE age COLOR POLISH ...
1 S 30 PINK GOLD
1 S 30 WHITE GOLD
1 S 30 PINK RODIUM
2 S 20 PINK RODIUM
2 S 20 PINK GOLD
2 S 20 WHITE GOLD
I have thought of using merge but it seems that this function will not give me the result I want, so I am not sure what I should use to achieve it.
答案 0 :(得分:0)
使用几个步骤,我认为我们可以使用plyr
和dplyr
包完成此操作。 请注意,您应该在plyr
之前加载dplyr
!
library(plyr)
library(dplyr)
# select first occurrence by ID and PCODE
# assumes your data is pre-sorted
table_A %>%
group_by(ID, PCODE) %>%
mutate(r_n = row_number()) %>%
filter(r_n == 1) %>%
select(-r_n) -> table_A1
# products not chosen by customer
# select only rows where all entries are not equal for each ID and PCODE
table_B1 <-
ddply(table_A1, c('ID', 'PCODE'),
function(x)
table_B[sapply(1:nrow(table_B),
function(y) !all(x[,names(table_B)] == table_B[y,])),])
# bind together after merging
rbind.data.frame(table_A1,
merge(table_A1[,c('ID','PCODE','age')],
table_B1,
by = c('ID', 'PCODE'))) %>%
arrange(ID, PCODE) -> table_AB
# table_AB
# ID PCODE age COLOR POLISH
# <int> <chr> <int> <chr> <chr>
# 1 1 S 30 PINK GOLD
# 2 1 S 30 WHITE GOLD
# 3 1 S 30 PINK RODIUM
# 4 2 S 20 PINK RODIUM
# 5 2 S 20 WHITE GOLD
# 6 2 S 20 PINK GOLD
table_A <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L),
PCODE = c("S", "S", "S", "S", "S", "S"),
age = c(30L, 30L, 30L, 20L, 20L, 20L),
COLOR = c("PINK", "PINK", "PINK", "PINK", "PINK", "PINK"),
POLISH = c("GOLD", "GOLD", "GOLD", "RODIUM", "RODIUM", "RODIUM")),
.Names = c("ID", "PCODE", "age", "COLOR", "POLISH"), class = "data.frame",
row.names = c(NA, -6L))
table_B <- structure(list(PCODE = c("S", "S", "S"),
COLOR = c("WHITE", "PINK", "PINK"),
POLISH = c("GOLD", "GOLD", "RODIUM")),
.Names = c("PCODE", "COLOR", "POLISH"),
class = "data.frame", row.names = c(NA, -3L))