Question

我正在使用R.＆＃xA对几个产品进行交叉销售分析;我已经转换了交易数据，它看起来像这样 -

＆＃xA;＆＃xA;

  df.articles＆lt;  -  cbind.data.frame（Art01，Art02，Art03）＆＃xA;＆＃xA; Art01 Art02 Art03＆＃xA;面包酸奶蛋＆＃xA;黄油面包酸奶＆＃xA;奶酪黄油面包＆＃xA;蛋奶酪NA＆＃xA;马铃薯NA NA＆＃xA;＆＃xA;实际数据为'data.frame'：69099 obs。 33个变量。＆＃xA;

＆＃xA;＆＃xA;

我想要一份与文章一起出售的所有不同文章及其计数的清单（比如面包或在这种情况下酸奶）实际数据包括56篇文章，我需要检查所有与其交叉销售的文章。所以我想要的结果必须是 -

＆＃xA;＆＃xA;

 与**面包一起销售的产品**与** Yoghurt **一起销售的产品#xA;＆＃XA;酸奶2面包2＆＃xA;鸡蛋1个鸡蛋1＆＃xA;奶酪1黄油1＆＃xA;黄油1＆＃xA;＆＃xA; ....并且列表继续这样说52个不同的文章。 ＆＃xA;

＆＃xA;＆＃xA;

我已经尝试了很多东西，但对于这个大数据集来说它太手动了。＆＃xA;拥有它会很棒这个问题在图书馆（data.table）的帮助下解决了，如果没有，那也应该很好。＆＃xA;非常感谢你提前。

＆＃XA;

Answer 1

有＆＃39; S ...

library(data.table)
setDT(DF)
dat = setorder(melt(DF[, r := .I], id="r", na.rm=TRUE)[, !"variable"])
res = dat[, CJ(art = value, other_art = value), by=r][art != other_art, .N, keyby=.(art, other_art)]

        art other_art N
 1:   bread    butter 2
 2:   bread    cheese 1
 3:   bread       egg 1
 4:   bread   yoghurt 2
 5:  butter     bread 2
 6:  butter    cheese 1
 7:  butter   yoghurt 1
 8:  cheese     bread 1
 9:  cheese    butter 1
10:  cheese       egg 1
11:     egg     bread 1
12:     egg    cheese 1
13:     egg   yoghurt 1
14: yoghurt     bread 2
15: yoghurt    butter 1
16: yoghurt       egg 1

评论。 OP提到有56个不同的项目，这意味着单个订单（r以上）在CJ之后可能有多达3136 = 56 ^ 2行。有几千个订单，这很快就会成为问题。这在进行组合计算时很典型，因此希望此任务仅用于浏览数据而不是分析数据。

浏览时的另一个想法是使用split和lapply来自定义显示：

library(magrittr)
split(res, by="art", keep.by = FALSE) %>% lapply(. %$% setNames(N, other_art))

$bread
 butter  cheese     egg yoghurt 
      2       1       1       2 

$butter
  bread  cheese yoghurt 
      2       1       1 

$cheese
 bread butter    egg 
     1      1      1 

$egg
  bread  cheese yoghurt 
      1       1       1 

$yoghurt
 bread butter    egg 
     2      1      1

我通常只会使用res[art == "bread"]，res[art == "bread" & other_art == "butter"]等进行探索，正如@ycw在评论中所建议的那样。

这里不需要马格丽特;它只允许不同的语法。

Answer 2

这是一个选项。我们可以使用tidyverse中的一些函数来创建包含结果的列表。 a_list4是最终输出。每个元素都是一篇包含相关文章数量的文章。

# Prepare the data frame "dt"
dt <- read.table(text = "Art01         Art02      Art03
  bread         yoghurt    egg
                 butter        bread      yoghurt
                 cheese        butter     bread
                 egg           cheese     NA
                 potato        NA         NA",
                 header = TRUE, stringsAsFactors = FALSE)

# Load package
library(tidyverse)

# A vector with articles
articles <- unique(unlist(dt))

# Remove NA
articles <- articles[!is.na(articles)]

# A function to filter the data frame by articles
filter_fun <- function(article, dt){
  dt2 <- dt %>% filter(rowSums(. == article) > 0)
  return(dt2)
}

# Apply the filter_fun
a_list <- map(articles, filter_fun, dt = dt)
names(a_list) <- articles

# Get articles in each element of the list
a_list2 <- map(a_list, function(dt) unlist(dt))

# Remove the articles based on the name of that article
a_list3 <- map2(a_list2, names(a_list2), function(vec, article){
  vec[!(vec %in% article)]
})

# Count the number
a_list4 <- map(a_list3, table)

# See the results
a_list4

$bread

 butter  cheese     egg yoghurt 
      2       1       1       2 

$butter

  bread  cheese yoghurt 
      2       1       1 

$cheese

 bread butter 
     1      1 

$egg

  bread yoghurt 
      1       1 

$potato
< table of extent 0 >

$yoghurt

 bread butter    egg 
     2      1      1

将数据从具有条件的一个表中子集化到不同的表中

2 个答案: