R - 字符列在堆叠列时丢失值

时间:2017-02-11 08:15:04

标签: r dataframe stack rows reshape2

我在将数据帧列堆叠成3列时遇到了一个奇怪的问题。出于某种原因,因子列在堆叠时会丢失其值。

当我使用下面的代码时,理论上,Treatment值应该叠加在一起,而不是被一个值替换。

library(reshape2)
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))

我不会粘贴整个结果,但这个频率表应该足够了:

2 个答案:

答案 0 :(得分:1)

重复列名称会导致此问题。更好的方法是拆分它们并更正列名,然后使用rbind将它们绑定在一起。 我试图通过创建两个新列来保存所有信息,以存储q3_...

的信息
do.call('rbind', lapply(seq(3, 12, by = 3), function(x) { y <- df1[,(x-2):x ]; 
                                                          y <- do.call("cbind", list(mo = colnames(y)[1], yr = colnames(y)[2], y ));
                                                          colnames(y)[3:4] <- c('mo_val', 'yr_val');
                                                          y }))

#         mo     yr mo_val yr_val     Treatment
# 1:  q3_1mo q3_1yr     NA     NA anti-androgen
# 2:  q3_1mo q3_1yr      5   2012 anti-androgen
# 3:  q3_1mo q3_1yr      4   2008 anti-androgen
# 4:  q3_1mo q3_1yr      4   2010 anti-androgen
# 5:  q3_1mo q3_1yr     NA     NA anti-androgen
# 6:  q3_1mo q3_1yr      2   2008 anti-androgen
# 7:  q3_2mo q3_2yr      8   2010     docetaxel
# 8:  q3_2mo q3_2yr      5   2012     docetaxel
# 9:  q3_2mo q3_2yr      4   2008     docetaxel
# 10: q3_2mo q3_2yr      4   2010     docetaxel
# 11: q3_2mo q3_2yr      8   2011     docetaxel
# 12: q3_2mo q3_2yr      2   2008     docetaxel
# 13: q3_3mo q3_3yr     NA     NA   abiraterone
# 14: q3_3mo q3_3yr      5   2012   abiraterone
# 15: q3_3mo q3_3yr      4   2008   abiraterone
# 16: q3_3mo q3_3yr      4   2010   abiraterone
# 17: q3_3mo q3_3yr      8   2011   abiraterone
# 18: q3_3mo q3_3yr      2   2008   abiraterone
# 19: q3_3mo q3_3yr     NA     NA         other
# 20: q3_3mo q3_3yr      5   2012         other
# 21: q3_3mo q3_3yr      4   2008         other
# 22: q3_3mo q3_3yr      4   2010         other
# 23: q3_3mo q3_3yr      8   2011         other
# 24: q3_3mo q3_3yr      2   2008         other
#         mo     yr mo_val yr_val     Treatment

数据:

df1 <- structure(list(q3_1mo = c(NA, 5L, 4L, 4L, NA, 2L), 
                      q3_1yr = c(NA, 2012L, 2008L, 2010L, NA, 2008L),
                      Treatment = c("anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen"),
                      q3_2mo = c(8L, 5L, 4L, 4L, 8L, 2L), 
                      q3_2yr = c(2010L, 2012L, 2008L, 2010L, 2011L, 2008L),
                      Treatment = c("docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel"),
                      q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
                      q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L), 
                      Treatment = c("abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone"), 
                      q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L), 
                      q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
                      Treatment = c("other", "other", "other", "other", "other", "other")), 
                 .Names = c("q3_1mo", "q3_1yr", "Treatment", "q3_2mo", "q3_2yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment"), 
                 row.names = c(NA, -6L), class = "data.frame")

答案 1 :(得分:0)

您也可以通过make.unique为变量指定唯一名称来解决此问题并使用相同的代码。

names(df) <- make.unique(names(df))
test1 <- reshape(df, direction="long",
                 varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))

返回

TEST1

    time q3_1mo q3_1yr     Treatment id
1.1    1     NA     NA anti-androgen  1
2.1    1      5   2012 anti-androgen  2
3.1    1      4   2008 anti-androgen  3
4.1    1      4   2010 anti-androgen  4
5.1    1     NA     NA anti-androgen  5
6.1    1      2   2008 anti-androgen  6
1.2    2      8   2010     docetaxel  1
2.2    2      5   2012     docetaxel  2
3.2    2      4   2008     docetaxel  3
4.2    2      4   2010     docetaxel  4
5.2    2      8   2011     docetaxel  5
6.2    2      2   2008     docetaxel  6
1.3    3     NA     NA   abiraterone  1
2.3    3      5   2012   abiraterone  2
3.3    3      4   2008   abiraterone  3
4.3    3      4   2010   abiraterone  4
5.3    3      8   2011   abiraterone  5
6.3    3      2   2008   abiraterone  6
1.4    4     NA     NA         other  1
2.4    4      5   2012         other  2
3.4    4      4   2008         other  3
4.4    4      4   2010         other  4
5.4    4      8   2011         other  5
6.4    4      2   2008         other  6

您必须花费几行清理名称并删除一些列,但您的代码将会通过。另请注意,reshape是基本R函数,因此加载reshape2是不必要的。