将ID号重塑为宽格式

时间:2018-06-20 03:55:45

标签: r dplyr tidyr

发布第二个问题,因为我的第一个被标记为重复。如果已经有一个针对这个特定问题的问题,我谨此致歉。

我从一个数据帧开始如下:

dat<-data.frame(
ID=c(100,101,101,101,102,103),
DEGREE=c("BA","BA","MS","PHD","BA","BA"),
YEAR=c(1980,1990, 1992, 1996, 2000, 2004))

> dat
ID DEGREE YEAR
100     BA 1980
101     BA 1990
101     MS 1992
101    PHD 1996
102     BA 2000
103     BA 2004

ID 101于1990年获得文学学士学位,1992年获得硕士学位,1996年获得博士学位。

我想将此数据框重塑成最终看起来像这样的宽格式:

 ID DEGREE_1 DEGREE_2 DEGREE_3 YEAR_DEGREE_1 YEAR_DEGREE_2 YEAR_DEGREE_3
 100    BA                           1980                            
 101    BA      MS      PHD          1990        1992          1996
 102    BA                           2000                            
 103    BA                           2004           

在原始问题答案的帮助下,我尝试使用以下代码创建新的数据框:

dat$DEGREE<-as.character(dat$DEGREE)
dat %>% group_by(ID) %>%
mutate(DegreeNum = paste("Degree", row_number(), sep = "_"))%>%
mutate(DegreeYear = paste("YearDegree", row_number(), sep = "_"))%>%
spread(DegreeNum, DEGREE, fill = "")%>%
spread(DegreeYear,YEAR,fill="")%>%
as.data.frame()

 ID Degree_1 Degree_2 Degree_3 YearDegree_1 YearDegree_2 YearDegree_3
 100   BA                           1980                          
 101                    PHD                                  1996
 101            MS                               1992             
 101   BA                           1990                          
 102   BA                           2000                          
 103   BA                           2004    

就我所能达到的程度,但无法弄清楚如何将其重塑为数据框,以使ID 101中的所有内容都位于一行中。任何帮助,将不胜感激。

1 个答案:

答案 0 :(得分:0)

使用tidyverse并不难...

df<-data.frame(ID=c(100,101,101,101,102,103),
           DEGREE=c("BA","BA","MS","PHD","BA","BA"),
             YEAR=c(1980,1990, 1992, 1996, 2000, 2004),
             stringsAsFactors=FALSE)

df1 <- df %>% select(-3) %>% group_by(ID) %>% mutate(i=row_number()) %>%
       as.data.frame() %>%
       reshape(direction="wide",idvar="ID",v.names="DEGREE",timevar="i",sep="_")
df1[is.na(df1)] <- ""

df2 <- df %>% select(-2) %>% group_by(ID) %>% mutate(i=row_number()) %>%
       as.data.frame() %>%
       reshape(direction="wide",idvar="ID",v.names="YEAR",timevar="i",sep="_")
df2[is.na(df2)] <- ""

inner_join(df1,df2,"ID")
#   ID DEGREE_1 DEGREE_2 DEGREE_3 YEAR_1 YEAR_2 YEAR_3
#1 100       BA                     1980              
#2 101       BA       MS      PHD   1990   1992   1996
#3 102       BA                     2000              
#4 103       BA                     2004