reshape()出错:不允许重复的“行名”

时间:2015-02-16 15:42:21

标签: r reshape

我有广泛的纵向数据,我想重塑成长数据。这是一个示例:

sex group id sex.1 group.1    status1  beg1  end1 status2  beg2  end2
1 1000   1     a 1000     1       a Vocational  <NA> S2007      HE S2007 S2008
2 1001   1     a 1001     1       a Vocational  <NA> S2007      HE S2008 S2012
3 1004   1     a 1004     1       a Vocational  <NA> S2008     999  <NA>  <NA>
4 1006   2     a 1006     2       a Vocational  <NA> S2007    Army S2012  <NA>
5 1007   1     a 1007     1       a         HE  <NA> S2007     999  <NA>  <NA>
6 1008   1     a 1008     1       a Vocational S2013  <NA>     999  <NA>  <NA>

我需要使用这种形状,与SPELL格式兼容:

  id sex  group index  status    beg     end
1000  1    a      1   Vocational  NA     S2007
1000  1    a      2      HE      S2008   S2012
...

我使用以下命令:

spell <- reshape(data, 
                 varying=names(data)[4:60],
                 direction="long",
                 idvar=c("id","sex","group"),
                 sep="")   

我收到以下错误消息:

    Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L],  : 
duplicate 'row.names' are not allowed
        In addition: Warning message: non-unique value when setting 'row.names': ‘NA.1’ 

我尝试过这种方式将NA值设置为999,但它不起作用。

data[is.na(data)] <- 999

你知道什么可以使这个工作吗?非常感谢!

3 个答案:

答案 0 :(得分:3)

该错误消息表明您在id变量中有重复的行或缺少值。

首先检查重复项:

with(data, any(duplicated(cbind(id, sex, group))))

如果为TRUE,那就有您的答案。

如果为FALSE,则id变量中可能缺少值,甚至可能缺少全部行,甚至可能在末尾。这可能是由于实际的源数据有空白行,或者是您的R命令要导入数据,例如使用read_excel并在range参数中指定了太多行。无论如何,请仔细检查数据以获取ID变量中缺少的值。将它们全部替换为999将无济于事。

答案 1 :(得分:1)

假设“id.1”,“sex.1”和“group.1”是重复的列,我们可以删除这些列,通过插入分隔符(“_”)和{{来更改列名称1}}

reshape

数据

data1 <- data[-(4:6)]
nm1 <- sub('\\d+$', '', names(data1)[-(1:3)])
names(data1)[-(1:3)] <- paste(nm1, ave(nm1, nm1, FUN=seq_along), sep="_")
res <- reshape(data1, varying=4:ncol(data1), direction='long',
             idvar=c('id', 'sex', 'group'), sep="_")
row.names(res) <- NULL
head(res)
#     id sex group time     status  beg   end
# 1 1000   1     a    1 Vocational <NA> S2007
# 2 1001   1     a    1 Vocational <NA> S2007
# 3 1004   1     a    1 Vocational <NA> S2008
# 4 1006   2     a    1 Vocational <NA> S2007
# 5 1007   1     a    1         HE <NA> S2007
# 6 1008   1     a    1 Vocational S2013  <NA>

答案 2 :(得分:1)

x2 <- reshape(mydata, idvar=c("id.1", "sex.1", "group.1"), direction="long", 
              varying=list(c(7, 10), c(8, 11), c(9, 12)), 
              v.names=c("status","beg","end"))

head(x2)

             id sex group id.1 sex.1 group.1 time     status   beg   end
1000.1.a.1 1000   1     a 1000     1       a    1 Vocational  <NA> S2007
1001.1.a.1 1001   1     a 1001     1       a    1 Vocational  <NA> S2007
1004.1.a.1 1004   1     a 1004     1       a    1 Vocational  <NA> S2008
1006.2.a.1 1006   2     a 1006     2       a    1 Vocational  <NA> S2007
1007.1.a.1 1007   1     a 1007     1       a    1         HE  <NA> S2007
1008.1.a.1 1008   1     a 1008     1       a    1 Vocational S2013  <NA>