我不确定如何解决此问题,并希望获得一些见解。我有一个唯一ID的多个所有者,但是由于“所有者”列中有多个所有者,因此唯一身份ID的填充次数超过一次。如果唯一ID具有1个或更多所有者,我想扩展此列。任何帮助将不胜感激。谢谢!
这是以前的样子:
df <- as.data.frame(matrix(NA, nrow = 11, ncol = 3))
df$V1 <- c('A','A','B','C','C','C','D','E','E','E','E')
df$V2 <- c('John','Derek','Sarah','Peter','Carlos','Angela','Ken','James','Nina','Gabby','Seth')
df$V3 <- c(100,90,80,85,66,98,62,74,56,85,77)
colnames(df) <- c('ID','Owner','Score')
这是我想要的样子:
df_out <- as.data.frame(matrix(NA,nrow = 5, ncol = 9))
df_out$V1 <- c('A','B','C','D','E')
df_out$V2 <- c('John','Sarah','Peter','Ken','James')
df_out$V3 <- c(100,80,85,62,74)
df_out$V4 <- c('Derek',NA,'Carlos',NA,'Nina')
df_out$V5 <- c(90,NA,66,NA,56)
df_out$V6 <- c(NA,NA,'Angela',NA,'Gabby')
df_out$V7 <- c(NA,NA,98,NA,85)
df_out$V8 <- c(NA,NA,NA,NA,'Seth')
df_out$V9 <- c(NA,NA,NA,NA,77)
colnames(df_out) <- c('ID','Owner','Score','Owner.2','Score.2','Owner.3','Score.3','Owner.4','Score.4')
请原谅我的代码,我仍然是初学者!
答案 0 :(得分:0)
这是使用data.table::dcast
的选项,它使用Owner和Score作为要透视的值,对ID(您的行标签)和行号(您的列标签)进行数据透视
library(data.table)
setDT(df)[, nr := rowid(ID)]
ans <- dcast(df, ID ~ nr, sep=".", value.var=c("Owner","Score"))
ans
输出:
ID Owner.1 Owner.2 Owner.3 Owner.4 Score.1 Score.2 Score.3 Score.4
1: A John Derek <NA> <NA> 100 90 NA NA
2: B Sarah <NA> <NA> <NA> 80 NA NA NA
3: C Peter Carlos Angela <NA> 85 66 98 NA
4: D Ken <NA> <NA> <NA> 62 NA NA NA
5: E James Nina Gabby Seth 74 56 85 77
要重新排序为特定的列顺序,可以使用列名称中的数字索引(即.1,.2,.3等)对列进行排序,如下所示:
nm <- names(ans)[-1L]
cols <- nm[order(sapply(strsplit(nm, "\\."), `[`, 2))]
setcolorder(ans, c("ID", cols))
ans
输出:
ID Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
1: A John 100 Derek 90 <NA> NA <NA> NA
2: B Sarah 80 <NA> NA <NA> NA <NA> NA
3: C Peter 85 Carlos 66 Angela 98 <NA> NA
4: D Ken 62 <NA> NA <NA> NA <NA> NA
5: E James 74 Nina 56 Gabby 85 Seth 77
答案 1 :(得分:0)
library(dplyr)
library(tidyr)
df %>% group_by(ID) %>%
#First collect all Owners and Scores for each ID in one place
summarise(own=paste0(Owner,collapse = ','),sco=paste0(Score,collapse = ',')) %>%
#Separate Owners to their specifc columns using tidyr::separate
separate(own,into = c('Owner.1','Owner.2','Owner.3','Owner.4')) %>%
separate(sco, into=c('Score.1','Score.2','Score.3','Score.4')) %>%
#Rearrange column names as in OP
select(ID, Owner.1, Score.1, Owner.2, Score.2, Owner.3, Score.3, Owner.4, Score.4)
# A tibble: 5 x 9
ID Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A John 100 Derek 90 NA NA NA NA
2 B Sarah 80 NA NA NA NA NA NA
3 C Peter 85 Carlos 66 Angela 98 NA NA
4 D Ken 62 NA NA NA NA NA NA
5 E James 74 Nina 56 Gabby 85 Seth 77