使用多个自定义列在R中传播数据框

时间:2018-08-23 00:34:28

标签: r

我不确定如何解决此问题,并希望获得一些见解。我有一个唯一ID的多个所有者,但是由于“所有者”列中有多个所有者,因此唯一身份ID的填充次数超过一次。如果唯一ID具有1个或更多所有者,我想扩展此列。任何帮助将不胜感激。谢谢!

这是以前的样子:

df <- as.data.frame(matrix(NA, nrow = 11, ncol = 3))
df$V1 <- c('A','A','B','C','C','C','D','E','E','E','E')
df$V2 <- c('John','Derek','Sarah','Peter','Carlos','Angela','Ken','James','Nina','Gabby','Seth')
df$V3 <- c(100,90,80,85,66,98,62,74,56,85,77)
colnames(df) <- c('ID','Owner','Score')

这是我想要的样子:

df_out <- as.data.frame(matrix(NA,nrow = 5, ncol = 9))
df_out$V1 <- c('A','B','C','D','E')
df_out$V2 <- c('John','Sarah','Peter','Ken','James')
df_out$V3 <- c(100,80,85,62,74)
df_out$V4 <- c('Derek',NA,'Carlos',NA,'Nina')
df_out$V5 <- c(90,NA,66,NA,56)
df_out$V6 <- c(NA,NA,'Angela',NA,'Gabby')
df_out$V7 <- c(NA,NA,98,NA,85)
df_out$V8 <- c(NA,NA,NA,NA,'Seth')
df_out$V9 <- c(NA,NA,NA,NA,77)
colnames(df_out) <- c('ID','Owner','Score','Owner.2','Score.2','Owner.3','Score.3','Owner.4','Score.4')

请原谅我的代码,我仍然是初学者!

2 个答案:

答案 0 :(得分:0)

这是使用data.table::dcast的选项,它使用Owner和Score作为要透视的值,对ID(您的行标签)和行号(您的列标签)进行数据透视

library(data.table)
setDT(df)[, nr := rowid(ID)]
ans <- dcast(df, ID ~ nr, sep=".", value.var=c("Owner","Score"))
ans

输出:

   ID Owner.1 Owner.2 Owner.3 Owner.4 Score.1 Score.2 Score.3 Score.4
1:  A    John   Derek    <NA>    <NA>     100      90      NA      NA
2:  B   Sarah    <NA>    <NA>    <NA>      80      NA      NA      NA
3:  C   Peter  Carlos  Angela    <NA>      85      66      98      NA
4:  D     Ken    <NA>    <NA>    <NA>      62      NA      NA      NA
5:  E   James    Nina   Gabby    Seth      74      56      85      77

要重新排序为特定的列顺序,可以使用列名称中的数字索引(即.1,.2,.3等)对列进行排序,如下所示:

nm <- names(ans)[-1L]
cols <- nm[order(sapply(strsplit(nm, "\\."), `[`, 2))]
setcolorder(ans, c("ID", cols))
ans

输出:

   ID Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
1:  A    John     100   Derek      90    <NA>      NA    <NA>      NA
2:  B   Sarah      80    <NA>      NA    <NA>      NA    <NA>      NA
3:  C   Peter      85  Carlos      66  Angela      98    <NA>      NA
4:  D     Ken      62    <NA>      NA    <NA>      NA    <NA>      NA
5:  E   James      74    Nina      56   Gabby      85    Seth      77

答案 1 :(得分:0)

library(dplyr)
library(tidyr)
df %>% group_by(ID) %>% 
       #First collect all Owners and Scores for each ID in one place  
       summarise(own=paste0(Owner,collapse = ','),sco=paste0(Score,collapse = ',')) %>%  
       #Separate Owners to their specifc columns using tidyr::separate
       separate(own,into = c('Owner.1','Owner.2','Owner.3','Owner.4')) %>% 
       separate(sco, into=c('Score.1','Score.2','Score.3','Score.4'))  %>%
       #Rearrange column names as in OP 
       select(ID, Owner.1, Score.1, Owner.2, Score.2, Owner.3, Score.3, Owner.4,  Score.4)


# A tibble: 5 x 9
ID    Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
<chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1 A     John    100     Derek   90      NA      NA      NA      NA     
2 B     Sarah   80      NA      NA      NA      NA      NA      NA     
3 C     Peter   85      Carlos  66      Angela  98      NA      NA     
4 D     Ken     62      NA      NA      NA      NA      NA      NA     
5 E     James   74      Nina    56      Gabby   85      Seth    77