在R编程中转置列和行

时间:2015-03-13 08:17:09

标签: r

INPUT是一个包含以下内容的csv文件。

Athlete Age Country Year    Closing Ceremony Date   Sport   Gold    Silver  Bronze
Michael Phelps  23  United States   2008    8/24/2008   Swimming    8   0   0
Michael Phelps  19  United States   2004    8/29/2004   Swimming    6   0   2
Michael Phelps  27  United States   2012    8/12/2012   Swimming    4   2   0
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    1   2   3

预期输出我需要

Athlete Age Country Year    Closing Ceremony Date   Sport   Metal Type  Metal Count
Michael Phelps  23  United States   2008    8/24/2008   Swimming    Gold    8
Michael Phelps  19  United States   2004    8/29/2004   Swimming    Gold    6
Michael Phelps  19  United States   2004    8/29/2004   Swimming    Bronze  2
Michael Phelps  27  United States   2012    8/12/2012   Swimming    Gold    4
Michael Phelps  27  United States   2012    8/12/2012   Swimming    Silver  2
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Gold    1
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Silver  2
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Bronze  3

任何人都可以帮忙

但是我正在尝试使用融化,但输出不是预期的。我使用了这个命令

> g <- melt(data)

我是这样的

           Athlete       Country Closing.Ceremony.Date    Sport variable value
1    Michael Phelps United States             8/24/2008 Swimming      Age    23
2    Michael Phelps United States             8/29/2004 Swimming      Age    19
3    Michael Phelps United States             8/12/2012 Swimming      Age    27
4  Natalie Coughlin United States             8/24/2008 Swimming      Age    25
5    Michael Phelps United States             8/24/2008 Swimming     Year  2008
6    Michael Phelps United States             8/29/2004 Swimming     Year  2004
7    Michael Phelps United States             8/12/2012 Swimming     Year  2012
8  Natalie Coughlin United States             8/24/2008 Swimming     Year  2008
9    Michael Phelps United States             8/24/2008 Swimming     Gold     8
10   Michael Phelps United States             8/29/2004 Swimming     Gold     6
11   Michael Phelps United States             8/12/2012 Swimming     Gold     4
12 Natalie Coughlin United States             8/24/2008 Swimming     Gold     1
13   Michael Phelps United States             8/24/2008 Swimming   Silver     0
14   Michael Phelps United States             8/29/2004 Swimming   Silver     0
15   Michael Phelps United States             8/12/2012 Swimming   Silver     2
16 Natalie Coughlin United States             8/24/2008 Swimming   Silver     2
17   Michael Phelps United States             8/24/2008 Swimming   Bronze     0
18   Michael Phelps United States             8/29/2004 Swimming   Bronze     2
19   Michael Phelps United States             8/12/2012 Swimming   Bronze     0
20 Natalie Coughlin United States             8/24/2008 Swimming   Bronze     3

但这不是我预期的输出。提前致谢

4 个答案:

答案 0 :(得分:1)

我们可以使用melt指定&#39; id&#39;列。根据预期的输出,从“宽”转换的列将被转换为&#39;长期&#39;形式是7:9即。金,银,铜柱。在我们转换为long表单后,请移除“&#39; 0&#39;对于“价值”&#39; subset

library(reshape2)
subset(melt(data, id.var=1:6), value!=0)
#               Athlete Age       Country Year Closing.Ceremony.Date    Sport
#1    Michael Phelps  23 United States 2008             8/24/2008 Swimming
#2    Michael Phelps  19 United States 2004             8/29/2004 Swimming
#3    Michael Phelps  27 United States 2012             8/12/2012 Swimming
#4  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#7    Michael Phelps  27 United States 2012             8/12/2012 Swimming
#8  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#10   Michael Phelps  19 United States 2004             8/29/2004 Swimming
#12 Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#   variable value
#1      Gold     8
#2      Gold     6
#3      Gold     4
#4      Gold     1
#7    Silver     2
#8    Silver     2
#10   Bronze     2
#12   Bronze     3

您还可以在value.name中使用var.namemelt等参数来更改默认&#39;上面的variable/value列。

或者可以使用gather中的tidyr来完成同样的工作。

library(dplyr)
library(tidyr)
gather(data, Type, MedalCount, 7:9)  %>% 
                                   filter(MedalCount>0)

数据

data <- structure(list(Athlete = c("Michael Phelps", "Michael Phelps", 
"Michael Phelps", "Natalie Coughlin"), Age = c(23L, 19L, 27L, 
25L), Country = c("United States", "United States", "United States", 
"United States"), Year = c(2008L, 2004L, 2012L, 2008L),
Closing.Ceremony.Date = c("8/24/2008", 
"8/29/2004", "8/12/2012", "8/24/2008"), Sport = c("Swimming", 
"Swimming", "Swimming", "Swimming"), Gold = c(8L, 6L, 4L, 1L), 
Silver = c(0L, 0L, 2L, 2L), Bronze = c(0L, 2L, 0L, 3L)),
.Names = c("Athlete", 
"Age", "Country", "Year", "Closing.Ceremony.Date", "Sport", "Gold", 
"Silver", "Bronze"), class = "data.frame", row.names = c(NA, -4L))

答案 1 :(得分:1)

尝试

subset(melt(df, varnames = c("Gold", "Silver", "Bronze"), id.vars = 1:8, value.name = "Count", varnames = "Metal"), Count > 0)
# Athlete Age Country Year    Closing Ceremony Date   Sport   Metal Type  Metal Count
# Michael Phelps  23  United States   2008    8/24/2008   Swimming    Gold    8
# Michael Phelps  19  United States   2004    8/29/2004   Swimming    Gold    6
# Michael Phelps  19  United States   2004    8/29/2004   Swimming    Bronze  2
# Michael Phelps  27  United States   2012    8/12/2012   Swimming    Gold    4
# Michael Phelps  27  United States   2012    8/12/2012   Swimming    Silver  2
# Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Gold    1
# Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Silver  2
# Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Bronze  3

答案 2 :(得分:1)

对于多样性,这里是基础R的简洁方法。概念上它与使用melt的方法相同(使用@ akrun的样本数据):

subset(cbind(data[1:6], stack(data[-c(1:6)])), values > 0)
#             Athlete Age       Country Year Closing.Ceremony.Date    Sport
# 1    Michael Phelps  23 United States 2008             8/24/2008 Swimming
# 2    Michael Phelps  19 United States 2004             8/29/2004 Swimming
# 3    Michael Phelps  27 United States 2012             8/12/2012 Swimming
# 4  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
# 7    Michael Phelps  27 United States 2012             8/12/2012 Swimming
# 8  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
# 10   Michael Phelps  19 United States 2004             8/29/2004 Swimming
# 12 Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#    values    ind
# 1       8   Gold
# 2       6   Gold
# 3       4   Gold
# 4       1   Gold
# 7       2 Silver
# 8       2 Silver
# 10      2 Bronze
# 12      3 Bronze

答案 3 :(得分:0)

尝试:

r> input <- data.frame(Athlete=c('Michael Phelps','Michael Phelps','Michael Phelps','Natalie Coughlin'), Age=c(23L,19L,27L,25L), Country=c('United States','United States','United States','United States'), Year=c(2008L,2004L,2012L,2008L), ClosingCeremonyDate=c(as.Date('2008-8-24'),as.Date('2004-8-29'),as.Date('2012-8-12'),as.Date('2008-8-24')), Sport=c('Swimming','Swimming','Swimming','Swimming'), Gold=c(8L,6L,4L,1L), Silver=c(0L,0L,2L,2L), Bronze=c(0L,2L,0L,3L) );
r> output <- subset(do.call(rbind, lapply(c('Bronze','Silver','Gold'), function(m) cbind(input[,-which(colnames(input)%in%c('Bronze','Silver','Gold'))], MedalType=m, MedalCount=input[,which(colnames(input)==m)] ) ) ), MedalCount>0 );
r> print(output, row.names=F );
          Athlete Age       Country Year ClosingCeremonyDate    Sport MedalType MedalCount
   Michael Phelps  19 United States 2004          2004-08-29 Swimming    Bronze          2
 Natalie Coughlin  25 United States 2008          2008-08-24 Swimming    Bronze          3
   Michael Phelps  27 United States 2012          2012-08-12 Swimming    Silver          2
 Natalie Coughlin  25 United States 2008          2008-08-24 Swimming    Silver          2
   Michael Phelps  23 United States 2008          2008-08-24 Swimming      Gold          8
   Michael Phelps  19 United States 2004          2004-08-29 Swimming      Gold          6
   Michael Phelps  27 United States 2012          2012-08-12 Swimming      Gold          4
 Natalie Coughlin  25 United States 2008          2008-08-24 Swimming      Gold          1

如果您想要特定的订购,可以添加对order()的通话。