展开群组

时间:2017-07-02 11:50:21

标签: r

我面临以下问题。我有一份M& A交易清单,每笔交易都包括(1)收单方,(2)供应商,(3)目标的数据。数据的结构关系可以是n:n:n,看起来类似于以下内容:

dealid acquirer target vendor
1      FirmA    FirmB  FirmC
1      FirmD           FirmE
2      .....................

所以问题在于交易中的行本身没有任何意义,例如,FirmD也是FirmB的共同收购者。

我现在需要在每个dealid 中创建所有可能的收单方 - 目标 - 供应商组合。我已设法使用expand.grid功能或仅通过merge扩展网格。但是,我不知道如何在组内扩展所有可能组合的网格。

4 个答案:

答案 0 :(得分:5)

您可以使用dplyr中的expandtidyr进行此操作。

df <- read.table(text="dealid acquirer target vendor
1      FirmA    FirmB  FirmC
1      FirmD    NA     FirmE
2      FirmA    NA     FirmC
2      FirmD    NA     FirmE
2      FirmG    FirmF  FirmE",header=TRUE,stringsAsFactors=FALSE)

library(dplyr);library(tidyr)
df%>%
  group_by(dealid)%>%
  expand(acquirer, target, vendor)

   dealid acquirer target vendor
    <int>    <chr>  <chr>  <chr>
 1      1    FirmA  FirmB  FirmC
 2      1    FirmA  FirmB  FirmE
 3      1    FirmD  FirmB  FirmC
 4      1    FirmD  FirmB  FirmE
 5      2    FirmA  FirmF  FirmC
 6      2    FirmA  FirmF  FirmE
 7      2    FirmD  FirmF  FirmC
 8      2    FirmD  FirmF  FirmE
 9      2    FirmG  FirmF  FirmC
10      2    FirmG  FirmF  FirmE

答案 1 :(得分:2)

我们可以使用data.table

library(data.table)
setDT(df1)[, CJ(acquirer = acquirer, target = target, vendor = vendor,
         unique = TRUE), dealid][!is.na(target)]
#    dealid acquirer target vendor
#1:      1    FirmA  FirmB  FirmC
#2:      1    FirmA  FirmB  FirmE
#3:      1    FirmD  FirmB  FirmC
#4:      1    FirmD  FirmB  FirmE
#5:      2    FirmA  FirmF  FirmC
#6:      2    FirmA  FirmF  FirmE
#7:      2    FirmD  FirmF  FirmC
#8:      2    FirmD  FirmF  FirmE
#9:      2    FirmG  FirmF  FirmC
#10:     2    FirmG  FirmF  FirmE

数据

 df1 <- structure(list(dealid = c(1L, 1L, 2L, 2L, 2L), acquirer = c("FirmA", 
"FirmD", "FirmA", "FirmD", "FirmG"), target = c("FirmB", NA, 
NA, NA, "FirmF"), vendor = c("FirmC", "FirmE", "FirmC", "FirmE", 
"FirmE")), .Names = c("dealid", "acquirer", "target", "vendor"
), class = "data.frame", row.names = c(NA, -5L))

答案 2 :(得分:1)

考虑基础R by,即通过因子分组( dealid )对数据帧进行切片的功能,允许expand.grid等扩展迭代操作返回列表数据帧。下面使用与@PLapointe和@akrun相同的数据样本:

dfList <- by(df, df$dealid, function(i){
  tmp <- cbind(dealid=max(i$dealid),
               expand.grid(acquirer=i$acquirer, target=i$target, vendor=i$vendor))
  tmp[!is.na(tmp$target),]
})

newdf <- unique(do.call(rbind, dfList))
row.names(newdf) <- NULL

newdf
#     dealid acquirer target vendor
# 1        1    FirmA  FirmB  FirmC
# 2        1    FirmD  FirmB  FirmC
# 3        1    FirmA  FirmB  FirmE
# 4        1    FirmD  FirmB  FirmE
# 5        2    FirmA  FirmF  FirmC
# 6        2    FirmD  FirmF  FirmC
# 7        2    FirmG  FirmF  FirmC
# 8        2    FirmA  FirmF  FirmE
# 9        2    FirmD  FirmF  FirmE
# 10       2    FirmG  FirmF  FirmE

答案 3 :(得分:0)

评论中提及{@ 1}为@Sotos:

split

这导致:

l1 <- split(df1, df1$dealid)
l2 <- lapply(l1, function(x) unique(with(x, expand.grid(acquirer, na.omit(target), vendor))))
df2 <- cbind.data.frame(dealid = rep(names(l2), sapply(l2, nrow)), do.call(rbind, l2))