我面临以下问题。我有一份M& A交易清单,每笔交易都包括(1)收单方,(2)供应商,(3)目标的数据。数据的结构关系可以是n:n:n,看起来类似于以下内容:
dealid acquirer target vendor
1 FirmA FirmB FirmC
1 FirmD FirmE
2 .....................
所以问题在于交易中的行本身没有任何意义,例如,FirmD也是FirmB的共同收购者。
我现在需要在每个dealid 中创建所有可能的收单方 - 目标 - 供应商组合。我已设法使用expand.grid
功能或仅通过merge
扩展网格。但是,我不知道如何在组内扩展所有可能组合的网格。
答案 0 :(得分:5)
您可以使用dplyr
中的expand
和tidyr
进行此操作。
df <- read.table(text="dealid acquirer target vendor
1 FirmA FirmB FirmC
1 FirmD NA FirmE
2 FirmA NA FirmC
2 FirmD NA FirmE
2 FirmG FirmF FirmE",header=TRUE,stringsAsFactors=FALSE)
library(dplyr);library(tidyr)
df%>%
group_by(dealid)%>%
expand(acquirer, target, vendor)
dealid acquirer target vendor
<int> <chr> <chr> <chr>
1 1 FirmA FirmB FirmC
2 1 FirmA FirmB FirmE
3 1 FirmD FirmB FirmC
4 1 FirmD FirmB FirmE
5 2 FirmA FirmF FirmC
6 2 FirmA FirmF FirmE
7 2 FirmD FirmF FirmC
8 2 FirmD FirmF FirmE
9 2 FirmG FirmF FirmC
10 2 FirmG FirmF FirmE
答案 1 :(得分:2)
我们可以使用data.table
library(data.table)
setDT(df1)[, CJ(acquirer = acquirer, target = target, vendor = vendor,
unique = TRUE), dealid][!is.na(target)]
# dealid acquirer target vendor
#1: 1 FirmA FirmB FirmC
#2: 1 FirmA FirmB FirmE
#3: 1 FirmD FirmB FirmC
#4: 1 FirmD FirmB FirmE
#5: 2 FirmA FirmF FirmC
#6: 2 FirmA FirmF FirmE
#7: 2 FirmD FirmF FirmC
#8: 2 FirmD FirmF FirmE
#9: 2 FirmG FirmF FirmC
#10: 2 FirmG FirmF FirmE
df1 <- structure(list(dealid = c(1L, 1L, 2L, 2L, 2L), acquirer = c("FirmA",
"FirmD", "FirmA", "FirmD", "FirmG"), target = c("FirmB", NA,
NA, NA, "FirmF"), vendor = c("FirmC", "FirmE", "FirmC", "FirmE",
"FirmE")), .Names = c("dealid", "acquirer", "target", "vendor"
), class = "data.frame", row.names = c(NA, -5L))
答案 2 :(得分:1)
考虑基础R by
,即通过因子分组( dealid )对数据帧进行切片的功能,允许expand.grid
等扩展迭代操作返回列表数据帧。下面使用与@PLapointe和@akrun相同的数据样本:
dfList <- by(df, df$dealid, function(i){
tmp <- cbind(dealid=max(i$dealid),
expand.grid(acquirer=i$acquirer, target=i$target, vendor=i$vendor))
tmp[!is.na(tmp$target),]
})
newdf <- unique(do.call(rbind, dfList))
row.names(newdf) <- NULL
newdf
# dealid acquirer target vendor
# 1 1 FirmA FirmB FirmC
# 2 1 FirmD FirmB FirmC
# 3 1 FirmA FirmB FirmE
# 4 1 FirmD FirmB FirmE
# 5 2 FirmA FirmF FirmC
# 6 2 FirmD FirmF FirmC
# 7 2 FirmG FirmF FirmC
# 8 2 FirmA FirmF FirmE
# 9 2 FirmD FirmF FirmE
# 10 2 FirmG FirmF FirmE
答案 3 :(得分:0)
评论中提及{@ 1}为@Sotos:
split
这导致:
l1 <- split(df1, df1$dealid)
l2 <- lapply(l1, function(x) unique(with(x, expand.grid(acquirer, na.omit(target), vendor))))
df2 <- cbind.data.frame(dealid = rep(names(l2), sapply(l2, nrow)), do.call(rbind, l2))