我在将数据帧列堆叠成3列时遇到了一个奇怪的问题。出于某种原因,因子列在堆叠时会丢失其值。
当我使用下面的代码时,理论上,Treatment值应该叠加在一起,而不是被一个值替换。
library(reshape2)
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
我不会粘贴整个结果,但这个频率表应该足够了:
答案 0 :(得分:1)
重复列名称会导致此问题。更好的方法是拆分它们并更正列名,然后使用rbind
将它们绑定在一起。
我试图通过创建两个新列来保存所有信息,以存储q3_...
do.call('rbind', lapply(seq(3, 12, by = 3), function(x) { y <- df1[,(x-2):x ];
y <- do.call("cbind", list(mo = colnames(y)[1], yr = colnames(y)[2], y ));
colnames(y)[3:4] <- c('mo_val', 'yr_val');
y }))
# mo yr mo_val yr_val Treatment
# 1: q3_1mo q3_1yr NA NA anti-androgen
# 2: q3_1mo q3_1yr 5 2012 anti-androgen
# 3: q3_1mo q3_1yr 4 2008 anti-androgen
# 4: q3_1mo q3_1yr 4 2010 anti-androgen
# 5: q3_1mo q3_1yr NA NA anti-androgen
# 6: q3_1mo q3_1yr 2 2008 anti-androgen
# 7: q3_2mo q3_2yr 8 2010 docetaxel
# 8: q3_2mo q3_2yr 5 2012 docetaxel
# 9: q3_2mo q3_2yr 4 2008 docetaxel
# 10: q3_2mo q3_2yr 4 2010 docetaxel
# 11: q3_2mo q3_2yr 8 2011 docetaxel
# 12: q3_2mo q3_2yr 2 2008 docetaxel
# 13: q3_3mo q3_3yr NA NA abiraterone
# 14: q3_3mo q3_3yr 5 2012 abiraterone
# 15: q3_3mo q3_3yr 4 2008 abiraterone
# 16: q3_3mo q3_3yr 4 2010 abiraterone
# 17: q3_3mo q3_3yr 8 2011 abiraterone
# 18: q3_3mo q3_3yr 2 2008 abiraterone
# 19: q3_3mo q3_3yr NA NA other
# 20: q3_3mo q3_3yr 5 2012 other
# 21: q3_3mo q3_3yr 4 2008 other
# 22: q3_3mo q3_3yr 4 2010 other
# 23: q3_3mo q3_3yr 8 2011 other
# 24: q3_3mo q3_3yr 2 2008 other
# mo yr mo_val yr_val Treatment
数据:强>
df1 <- structure(list(q3_1mo = c(NA, 5L, 4L, 4L, NA, 2L),
q3_1yr = c(NA, 2012L, 2008L, 2010L, NA, 2008L),
Treatment = c("anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen"),
q3_2mo = c(8L, 5L, 4L, 4L, 8L, 2L),
q3_2yr = c(2010L, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel"),
q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone"),
q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("other", "other", "other", "other", "other", "other")),
.Names = c("q3_1mo", "q3_1yr", "Treatment", "q3_2mo", "q3_2yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment"),
row.names = c(NA, -6L), class = "data.frame")
答案 1 :(得分:0)
您也可以通过make.unique
为变量指定唯一名称来解决此问题并使用相同的代码。
names(df) <- make.unique(names(df))
test1 <- reshape(df, direction="long",
varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
返回
TEST1
time q3_1mo q3_1yr Treatment id
1.1 1 NA NA anti-androgen 1
2.1 1 5 2012 anti-androgen 2
3.1 1 4 2008 anti-androgen 3
4.1 1 4 2010 anti-androgen 4
5.1 1 NA NA anti-androgen 5
6.1 1 2 2008 anti-androgen 6
1.2 2 8 2010 docetaxel 1
2.2 2 5 2012 docetaxel 2
3.2 2 4 2008 docetaxel 3
4.2 2 4 2010 docetaxel 4
5.2 2 8 2011 docetaxel 5
6.2 2 2 2008 docetaxel 6
1.3 3 NA NA abiraterone 1
2.3 3 5 2012 abiraterone 2
3.3 3 4 2008 abiraterone 3
4.3 3 4 2010 abiraterone 4
5.3 3 8 2011 abiraterone 5
6.3 3 2 2008 abiraterone 6
1.4 4 NA NA other 1
2.4 4 5 2012 other 2
3.4 4 4 2008 other 3
4.4 4 4 2010 other 4
5.4 4 8 2011 other 5
6.4 4 2 2008 other 6
您必须花费几行清理名称并删除一些列,但您的代码将会通过。另请注意,reshape
是基本R函数,因此加载reshape2
是不必要的。