如何将列值转换为R中数据框中每个唯一值的行?

时间:2015-06-08 07:29:31

标签: r

我有一个大型数据框,其中包含12列,每列有两种类型的值:Rested和Active。我想将每个月的列转换为行,从而将所有月份列(Jan,Feb,Mar ...)置于' Month'

之下

我的数据如下:

 ID     L1  L2  Year    Month   R   A
1234    89  65  2003    Jan     11  76
1234    89  65  2003    Feb     34  86
1234    89  65  2003    Mar     6   30
1234    89  65  2003    Apr     7   76
1234    89  65  2003    May     8   43
1234    89  65  2003    Jun     90  67
1234    89  65  2003    Jul     65  13
1234    89  65  2003    Aug     54  98
1234    89  65  2003    Sep     3   67
1234    89  65  2003    Oct     22  0
1234    89  65  2003    Nov     55  127
1234    89  65  2003    Dec     66  74
1234    45  76  2004    Jan     67  3
1234    45  76  2004    Feb     87  2
1234    45  76  2004    Mar     98  65
1234    45  76  2004    Apr     5   78
1234    45  76  2004    May     4   44
1234    45  76  2004    Jun     3   53
1234    45  76  2004    Jul     77  67
1234    45  76  2004    Aug     8   98
1234    45  76  2004    Sep     99  79
1234    45  76  2004    Oct     76  53
1234    45  76  2004    Nov     56  23
1234    45  76  2004    Dec     4   65

我试图使其显示如下(R列代表Rested,A列代表Active。月JR,FR,MR分别表示Jan Rested,2月Rested,Mar Rested和JA,FA,MA表示Jan Active,Feb Active,Mar Active等):

所以,在这里,我尝试将每个每月列转换为行,并通过创建新的月份列将它们保持在R和A值旁边。

stack

我尝试了各种各样的事情,例如meltunlistdata_reshape <- reshape(df,direction="long", varying=list(c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA")), v.names="Precipitation", timevar="Month") data_stacked <- stack(data, select = c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA"))

COALESCE

但他们的结果并不十分预期 - 他们给出了所有年份的Jan值,然后给出了所有年份的2月值,然后给出了所有年份的3月值等等。但我想以适当的月度方式构造它们每个年份对于整个数据集中存在的每个ID。

如何在R?

中实现这一目标

3 个答案:

答案 0 :(得分:5)

以下是使用data.table

devel version的可能解决方案
library(data.table) ## v >= 1.9.5

res <- melt(setDT(df),
            id = 1:4, ## id variables
            measure = list(5:16, 17:ncol(df)), # a list of two groups of measure variables
            variable = "Month", # The name of the additional variable
            value = c("R", "A")) # The names of the grouped variables

setorder(res, ID, -L1, L2, Year) ## Reordering the data to match the desired output
res[, Month := month.abb[Month]] ## You don't really need this part as you already have the months numbers

#       ID L1 L2 Year Month  R   A
#  1: 1234 89 65 2003   Jan 11  76
#  2: 1234 89 65 2003   Feb 34  86
#  3: 1234 89 65 2003   Mar  6  30
#  4: 1234 89 65 2003   Apr  7  76
#  5: 1234 89 65 2003   May  8  43
#  6: 1234 89 65 2003   Jun 90  67
#  7: 1234 89 65 2003   Jul 65  13
#  8: 1234 89 65 2003   Aug 54  98
#  9: 1234 89 65 2003   Sep  3  67
# 10: 1234 89 65 2003   Oct 22   0
# 11: 1234 89 65 2003   Nov 55 127
# 12: 1234 89 65 2003   Dec 66  74
# 13: 1234 45 76 2004   Jan 67   3
# 14: 1234 45 76 2004   Feb 87   2
# 15: 1234 45 76 2004   Mar 98  65
# 16: 1234 45 76 2004   Apr  5  78
# 17: 1234 45 76 2004   May  4  44
# 18: 1234 45 76 2004   Jun  3  53
# 19: 1234 45 76 2004   Jul 77  67
# 20: 1234 45 76 2004   Aug  8  98
# 21: 1234 45 76 2004   Sep 99  79
# 22: 1234 45 76 2004   Oct 76  53
# 23: 1234 45 76 2004   Nov 56  23
# 24: 1234 45 76 2004   Dec  4  65

安装说明:

library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)

答案 1 :(得分:5)

这是一个基本重塑方法:

res <- reshape(mydf, direction="long", varying=list(5:16, 17:28), v.names=c("R", "A"), times = month.name, timevar = "Month")
res[with(res, order(ID, -L1, L2, Year)), -8]

答案 2 :(得分:3)

这是一个不优雅的解决方案,但我要发布它只是为了展示如何使用基本工具解决问题而不依赖于高级功能,而任务并不一定需要它们。我认为你拥有的工具越多,就越能正确处理问题。我们在这里:

 #extract the data part
 data<-t(as.matrix(df[,5:28]))
 #build the data.frame cbinding the needed columns
 res<-cbind(df[rep(1:nrow(df),each=12),1:4],  #this repeats the first 4 columns 12 times each
       Month=month.abb, #the month column
       R=as.vector(data[1:12,]), # the R column, obtained from the first 12 rows of data
       A=as.vector(data[13:24,])) #as above
 rownames(res)<-NULL #just to remove the row names