使用dcast将数据长距离传输

时间:2018-07-18 12:32:25

标签: r dcast

过去几天我一直在Stack Overflow上寻求解决我遇到的这个问题的方法。

我正在分析从国家学生信息交换所获得的数据,特别是毕业证书。所以我有一些虚拟数据

'' No such of file.

我正在尝试扩大此数据范围:

df <- data.frame(id=c('1', '1', '1'),   grad_date=c('20160501', '20170524', '20180524'),    order=c('1', '2', '3'),     inst_name=c('community college 1', 'univ 1', 'univ 2'),     inst_state=c('CA', 'CA', 'CA'),     level=c('Associate of Applied Sciences', 'Bachelors of Applied Sciences', 'Masters of Applied Sciences'),   deg_maj_1=c('NETWORK SECURITY', 'INFO ASSUR CYBR-SECURITY', 'CISCO CCNA PREPARATION'),  deg_cip_1=c('111003', '520299', '111003'),  deg_maj_2=c('NA', 'NA', 'NA'),  deg_cip_2=c('NA', 'NA', 'NA'),  deg_maj_3=c('NA', 'NA', 'NA'),  deg_cip_3=c('NA', 'NA', 'NA'),  deg_maj_4=c('NA', 'NA', 'NA'),  deg_cip_4=c('NA', 'NA', 'NA'))

我收到此错误:

.subset2(x,i,确切=准确)中的错误:   递归索引在第2级失败

我去了herehere并遇到了相同的错误

如果这有帮助:

df_wide<- dcast(df, id ~ order, value.var = c("inst_name", "inst_state", "level", "deg_maj_1", "deg_cip_1", "deg_maj_2", "deg_cip_2", "deg_maj_3", "deg_cip_3", "deg_maj_4", "deg_cip_4")) 

有人可以协助吗?我不知所措

编辑后添加: 所需的输出(是的,我知道这是looooooooooong,但这是必需的)

str(df)
'data.frame':   3 obs. of  14 variables:
 $ id        : Factor w/ 1 level "1": 1 1 1
 $ grad_date : Factor w/ 3 levels "20160501","20170524",..: 1 2 3
 $ order     : Factor w/ 3 levels "1","2","3": 1 2 3
 $ inst_name : Factor w/ 3 levels "community college 1",..: 1 2 3
 $ inst_state: Factor w/ 1 level "CA": 1 1 1
 $ level     : Factor w/ 3 levels "Associate of Applied Sciences",..: 1 2 3
 $ deg_maj_1 : Factor w/ 3 levels "CISCO CCNA PREPARATION",..: 3 2 1
 $ deg_cip_1 : Factor w/ 2 levels "111003","520299": 1 2 1
 $ deg_maj_2 : Factor w/ 1 level "NA": 1 1 1
 $ deg_cip_2 : Factor w/ 1 level "NA": 1 1 1
 $ deg_maj_3 : Factor w/ 1 level "NA": 1 1 1
 $ deg_cip_3 : Factor w/ 1 level "NA": 1 1 1
 $ deg_maj_4 : Factor w/ 1 level "NA": 1 1 1
 $ deg_cip_4 : Factor w/ 1 level "NA": 1 1 1

2 个答案:

答案 0 :(得分:1)

如果您不依赖dcast(),则基R的reshape()可以使您到达所需的位置。

reshape(df, idvar="id", timevar = "order", direction="wide")

屈服

  id grad_date.1         inst_name.1 inst_state.1                       level.1
1  1    20160501 community college 1           CA Associate of Applied Sciences
       deg_maj_1.1 deg_cip_1.1 deg_maj_2.1 deg_cip_2.1 deg_maj_3.1 deg_cip_3.1
1 NETWORK SECURITY      111003          NA          NA          NA          NA
  deg_maj_4.1 deg_cip_4.1 grad_date.2 inst_name.2 inst_state.2
1          NA          NA    20170524      univ 1           CA
                        level.2              deg_maj_1.2 deg_cip_1.2 deg_maj_2.2
1 Bachelors of Applied Sciences INFO ASSUR CYBR-SECURITY      520299          NA
  deg_cip_2.2 deg_maj_3.2 deg_cip_3.2 deg_maj_4.2 deg_cip_4.2 grad_date.3 inst_name.3
1          NA          NA          NA          NA          NA    20180524      univ 2
  inst_state.3                     level.3            deg_maj_1.3 deg_cip_1.3
1           CA Masters of Applied Sciences CISCO CCNA PREPARATION      111003
  deg_maj_2.3 deg_cip_2.3 deg_maj_3.3 deg_cip_3.3 deg_maj_4.3 deg_cip_4.3
1          NA          NA          NA          NA          NA          NA

答案 1 :(得分:0)

出于完整性考虑,data.table的{​​{1}}版本可以同时重塑多个值列:

dcast()
library(data.table)
dcast(setDT(df), id ~ order, value.var = tail(names(df), -3L))