为名称

时间:2017-08-27 04:04:21

标签: r

我正在尝试连线合并。我有一个数据集df和通讯员ID列表ID

ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"), 
             Value = c(101,102, 103,201,202,301))

df <-  data.frame(Name = c("A", "A","B", "C"))

我想将ID合并/分配给df 得到一个df看起来像

Name   ID1  ID2 ID3
A      101  102 103
A      101  102 103
B      201  202
C      301 

3 个答案:

答案 0 :(得分:2)

试试这个?请注意,使用NA的缺失值优于空白〜

如果只想使用''而希望NA而不是outdf[is.na(outdf)]=''

library(dplyr)
ID=ID%>%group_by(Alphabet)%>%mutate(ID=row_number())
library(reshape2)
DF=as.data.frame(acast(ID, Alphabet~ID, value.var="Value"))
DF$Name=row.names(DF)
merge(df,DF,by='Name')


  Name   1   2   3
1    A 101 102 103
2    A 101 102 103
3    B 201 202  NA
4    C 301  NA  NA

或使用tidyr(推荐〜因为您正在使用data.frame

library(dplyr)
library(tidyr)
ID=ID%>%group_by(Alphabet)%>%mutate(id=row_number())
DF=spread(ID, id,Value)
merge(df,DF,by.x='Name',by.y='Alphabet')

  Name   1   2   3
1    A 101 102 103
2    A 101 102 103
3    B 201 202  NA
4    C 301  NA  NA

答案 1 :(得分:1)

我会通过准备一个列表来解决这个问题,该列表包含最终数据框的行,然后将它们“重新绑定”在一起。唯一的技巧是计算行的最大长度并相应地添加NA。这应该工作。

ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"), 
                 Value = c(101,102, 103,201,202,301))

df <-  data.frame(Name = c("A", "A","B", "C"))


tmp <- lapply(df$Name, (function(id){
  ID[ID$Alphabet == id, ]$Value
}))
max.el <- max(sapply(tmp, length))
out.df <- do.call(rbind, lapply(tmp, (function(el){
  len.na <- max.el - length(el) 
  c(el, rep(NA, len.na))  
})))

print(out.df, na.print = "")

这是结果

     [,1] [,2] [,3]
[1,]  101  102  103
[2,]  101  102  103
[3,]  201  202     
[4,]  301    

如果显示NA不是问题,那么

colnames(out.df) <- paste("ID", c(1:max.el), sep = "")
out.df <- cbind(df, out.df)
out.df

  Name ID1 ID2 ID3
1    A 101 102 103
2    A 101 102 103
3    B 201 202  NA
4    C 301  NA  NA

答案 2 :(得分:0)

为了完整起见,此处还提供了一个解决方案,使用dcast()包中的data.table从长到大格式重新整形和右连接

library(data.table)
# coerce to data.table
setDT(D)[
  # reshape from long to wide, thereby creating column names
  , dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"))][
    # rename column
    , setnames(.SD, "Alphabet", "Name")][
      # right join with df to repeat rows
      setDT(df), on = "Name"]
   Name ID1 ID2 ID3
1:    A 101 102 103
2:    A 101 102 103
3:    B 201 202  NA
4:    C 301  NA  NA

如果不能显示NA,则需要将输出转换为字符类型:

setDT(D)[, dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"), as.character, fill = "")][
    , setnames(.SD, "Alphabet", "Name")][
      setDT(df), on = "Name"]
   Name ID1 ID2 ID3
1:    A 101 102 103
2:    A 101 102 103
3:    B 201 202    
4:    C 301