我正在尝试连线合并。我有一个数据集df
和通讯员ID列表ID
ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"),
Value = c(101,102, 103,201,202,301))
df <- data.frame(Name = c("A", "A","B", "C"))
我想将ID合并/分配给df
得到一个df看起来像
Name ID1 ID2 ID3
A 101 102 103
A 101 102 103
B 201 202
C 301
答案 0 :(得分:2)
试试这个?请注意,使用NA
的缺失值优于空白〜
如果只想使用''
而希望NA
而不是outdf[is.na(outdf)]=''
library(dplyr)
ID=ID%>%group_by(Alphabet)%>%mutate(ID=row_number())
library(reshape2)
DF=as.data.frame(acast(ID, Alphabet~ID, value.var="Value"))
DF$Name=row.names(DF)
merge(df,DF,by='Name')
Name 1 2 3
1 A 101 102 103
2 A 101 102 103
3 B 201 202 NA
4 C 301 NA NA
或使用tidyr
(推荐〜因为您正在使用data.frame
)
library(dplyr)
library(tidyr)
ID=ID%>%group_by(Alphabet)%>%mutate(id=row_number())
DF=spread(ID, id,Value)
merge(df,DF,by.x='Name',by.y='Alphabet')
Name 1 2 3
1 A 101 102 103
2 A 101 102 103
3 B 201 202 NA
4 C 301 NA NA
答案 1 :(得分:1)
我会通过准备一个列表来解决这个问题,该列表包含最终数据框的行,然后将它们“重新绑定”在一起。唯一的技巧是计算行的最大长度并相应地添加NA。这应该工作。
ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"),
Value = c(101,102, 103,201,202,301))
df <- data.frame(Name = c("A", "A","B", "C"))
tmp <- lapply(df$Name, (function(id){
ID[ID$Alphabet == id, ]$Value
}))
max.el <- max(sapply(tmp, length))
out.df <- do.call(rbind, lapply(tmp, (function(el){
len.na <- max.el - length(el)
c(el, rep(NA, len.na))
})))
print(out.df, na.print = "")
这是结果
[,1] [,2] [,3]
[1,] 101 102 103
[2,] 101 102 103
[3,] 201 202
[4,] 301
如果显示NA不是问题,那么
colnames(out.df) <- paste("ID", c(1:max.el), sep = "")
out.df <- cbind(df, out.df)
out.df
Name ID1 ID2 ID3
1 A 101 102 103
2 A 101 102 103
3 B 201 202 NA
4 C 301 NA NA
答案 2 :(得分:0)
为了完整起见,此处还提供了一个解决方案,使用dcast()
包中的data.table
从长到大格式重新整形和右连接:
library(data.table)
# coerce to data.table
setDT(D)[
# reshape from long to wide, thereby creating column names
, dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"))][
# rename column
, setnames(.SD, "Alphabet", "Name")][
# right join with df to repeat rows
setDT(df), on = "Name"]
Name ID1 ID2 ID3 1: A 101 102 103 2: A 101 102 103 3: B 201 202 NA 4: C 301 NA NA
如果不能显示NA
,则需要将输出转换为字符类型:
setDT(D)[, dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"), as.character, fill = "")][
, setnames(.SD, "Alphabet", "Name")][
setDT(df), on = "Name"]
Name ID1 ID2 ID3 1: A 101 102 103 2: A 101 102 103 3: B 201 202 4: C 301