Question

我的数据框'df'具有以下结构：

假设存在4个不同的商店和标题

Title Store
T1    S1
T1    S2
T1    S3
T1    S4
T2    S1
T2    S2
T2    S4
T3    S1
T3    S4
T4    S1
T4    S2

问题：

我想找到所有标题组合的常用商店

预期输出：

Title_combination     Common_Store      
T1,T2,T3,T4           S1     
T1,T2,T3              S1,S4
T1,T2,T4              S1,S2
........             ...... so on

Answer 1

使用base个功能。内联说明。

数据：

tbl <- read.table(text="Title Store
T1    S1
T1    S2
T1    S3
T1    S4
T2    S1
T2    S2
T2    S4
T3    S1
T3    S4
T4    S1
T4    S2", header=TRUE)

运作：

#get unique titles
titles <- unique(tbl$Title)

#combine rows into a single data.frame
do.call(rbind, unlist(
    #for each set of n titles
    lapply(seq_along(titles), function(n)
        #using combn to generate combi and apply function to each combi
        combn(titles, n, function(subtitles) {
            #recursively intersect all stores for each title within the set subtitles 
            cstores <- Reduce(function(s, t2) intersect(s, tbl$Store[tbl$Title==t2]), 
                subtitles[-1], 
                tbl$Store[tbl$Title==subtitles[1]])
            data.frame(
                Title_combi=paste(subtitles, collapse=","),
                Common_Store=paste(cstores, collapse=",")
            )
        }, simplify=FALSE) #dont simplify results from combn
    ), 
    recursive=FALSE)) #unlist 1 level of combi results

结果：

#    Title_combi Common_Store
# 1           T1  S1,S2,S3,S4
# 2           T2     S1,S2,S4
# 3           T3        S1,S4
# 4           T4        S1,S2
# 5        T1,T2     S1,S2,S4
# 6        T1,T3        S1,S4
# 7        T1,T4        S1,S2
# 8        T2,T3        S1,S4
# 9        T2,T4        S1,S2
# 10       T3,T4           S1
# 11    T1,T2,T3        S1,S4
# 12    T1,T2,T4        S1,S2
# 13    T1,T3,T4           S1
# 14    T2,T3,T4           S1
# 15 T1,T2,T3,T4           S1

在数据框中查找重叠元素

1 个答案: