我有一个大型数据框,其中包含12列,每列有两种类型的值:Rested和Active。我想将每个月的列转换为行,从而将所有月份列(Jan,Feb,Mar ...)置于' Month'
之下我的数据如下:
ID L1 L2 Year Month R A
1234 89 65 2003 Jan 11 76
1234 89 65 2003 Feb 34 86
1234 89 65 2003 Mar 6 30
1234 89 65 2003 Apr 7 76
1234 89 65 2003 May 8 43
1234 89 65 2003 Jun 90 67
1234 89 65 2003 Jul 65 13
1234 89 65 2003 Aug 54 98
1234 89 65 2003 Sep 3 67
1234 89 65 2003 Oct 22 0
1234 89 65 2003 Nov 55 127
1234 89 65 2003 Dec 66 74
1234 45 76 2004 Jan 67 3
1234 45 76 2004 Feb 87 2
1234 45 76 2004 Mar 98 65
1234 45 76 2004 Apr 5 78
1234 45 76 2004 May 4 44
1234 45 76 2004 Jun 3 53
1234 45 76 2004 Jul 77 67
1234 45 76 2004 Aug 8 98
1234 45 76 2004 Sep 99 79
1234 45 76 2004 Oct 76 53
1234 45 76 2004 Nov 56 23
1234 45 76 2004 Dec 4 65
我试图使其显示如下(R列代表Rested,A列代表Active。月JR,FR,MR分别表示Jan Rested,2月Rested,Mar Rested和JA,FA,MA表示Jan Active,Feb Active,Mar Active等):
所以,在这里,我尝试将每个每月列转换为行,并通过创建新的月份列将它们保持在R和A值旁边。
stack
我尝试了各种各样的事情,例如melt
,unlist
,data_reshape <- reshape(df,direction="long", varying=list(c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA")), v.names="Precipitation", timevar="Month")
data_stacked <- stack(data, select = c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA"))
COALESCE
但他们的结果并不十分预期 - 他们给出了所有年份的Jan值,然后给出了所有年份的2月值,然后给出了所有年份的3月值等等。但我想以适当的月度方式构造它们每个年份对于整个数据集中存在的每个ID。
如何在R?
中实现这一目标答案 0 :(得分:5)
以下是使用data.table
library(data.table) ## v >= 1.9.5
res <- melt(setDT(df),
id = 1:4, ## id variables
measure = list(5:16, 17:ncol(df)), # a list of two groups of measure variables
variable = "Month", # The name of the additional variable
value = c("R", "A")) # The names of the grouped variables
setorder(res, ID, -L1, L2, Year) ## Reordering the data to match the desired output
res[, Month := month.abb[Month]] ## You don't really need this part as you already have the months numbers
# ID L1 L2 Year Month R A
# 1: 1234 89 65 2003 Jan 11 76
# 2: 1234 89 65 2003 Feb 34 86
# 3: 1234 89 65 2003 Mar 6 30
# 4: 1234 89 65 2003 Apr 7 76
# 5: 1234 89 65 2003 May 8 43
# 6: 1234 89 65 2003 Jun 90 67
# 7: 1234 89 65 2003 Jul 65 13
# 8: 1234 89 65 2003 Aug 54 98
# 9: 1234 89 65 2003 Sep 3 67
# 10: 1234 89 65 2003 Oct 22 0
# 11: 1234 89 65 2003 Nov 55 127
# 12: 1234 89 65 2003 Dec 66 74
# 13: 1234 45 76 2004 Jan 67 3
# 14: 1234 45 76 2004 Feb 87 2
# 15: 1234 45 76 2004 Mar 98 65
# 16: 1234 45 76 2004 Apr 5 78
# 17: 1234 45 76 2004 May 4 44
# 18: 1234 45 76 2004 Jun 3 53
# 19: 1234 45 76 2004 Jul 77 67
# 20: 1234 45 76 2004 Aug 8 98
# 21: 1234 45 76 2004 Sep 99 79
# 22: 1234 45 76 2004 Oct 76 53
# 23: 1234 45 76 2004 Nov 56 23
# 24: 1234 45 76 2004 Dec 4 65
安装说明:
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
答案 1 :(得分:5)
这是一个基本重塑方法:
res <- reshape(mydf, direction="long", varying=list(5:16, 17:28), v.names=c("R", "A"), times = month.name, timevar = "Month")
res[with(res, order(ID, -L1, L2, Year)), -8]
答案 2 :(得分:3)
这是一个不优雅的解决方案,但我要发布它只是为了展示如何使用基本工具解决问题而不依赖于高级功能,而任务并不一定需要它们。我认为你拥有的工具越多,就越能正确处理问题。我们在这里:
#extract the data part
data<-t(as.matrix(df[,5:28]))
#build the data.frame cbinding the needed columns
res<-cbind(df[rep(1:nrow(df),each=12),1:4], #this repeats the first 4 columns 12 times each
Month=month.abb, #the month column
R=as.vector(data[1:12,]), # the R column, obtained from the first 12 rows of data
A=as.vector(data[13:24,])) #as above
rownames(res)<-NULL #just to remove the row names