INPUT是一个包含以下内容的csv文件。
Athlete Age Country Year Closing Ceremony Date Sport Gold Silver Bronze
Michael Phelps 23 United States 2008 8/24/2008 Swimming 8 0 0
Michael Phelps 19 United States 2004 8/29/2004 Swimming 6 0 2
Michael Phelps 27 United States 2012 8/12/2012 Swimming 4 2 0
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming 1 2 3
预期输出我需要
Athlete Age Country Year Closing Ceremony Date Sport Metal Type Metal Count
Michael Phelps 23 United States 2008 8/24/2008 Swimming Gold 8
Michael Phelps 19 United States 2004 8/29/2004 Swimming Gold 6
Michael Phelps 19 United States 2004 8/29/2004 Swimming Bronze 2
Michael Phelps 27 United States 2012 8/12/2012 Swimming Gold 4
Michael Phelps 27 United States 2012 8/12/2012 Swimming Silver 2
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Gold 1
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Silver 2
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Bronze 3
任何人都可以帮忙
但是我正在尝试使用融化,但输出不是预期的。我使用了这个命令
> g <- melt(data)
我是这样的
Athlete Country Closing.Ceremony.Date Sport variable value
1 Michael Phelps United States 8/24/2008 Swimming Age 23
2 Michael Phelps United States 8/29/2004 Swimming Age 19
3 Michael Phelps United States 8/12/2012 Swimming Age 27
4 Natalie Coughlin United States 8/24/2008 Swimming Age 25
5 Michael Phelps United States 8/24/2008 Swimming Year 2008
6 Michael Phelps United States 8/29/2004 Swimming Year 2004
7 Michael Phelps United States 8/12/2012 Swimming Year 2012
8 Natalie Coughlin United States 8/24/2008 Swimming Year 2008
9 Michael Phelps United States 8/24/2008 Swimming Gold 8
10 Michael Phelps United States 8/29/2004 Swimming Gold 6
11 Michael Phelps United States 8/12/2012 Swimming Gold 4
12 Natalie Coughlin United States 8/24/2008 Swimming Gold 1
13 Michael Phelps United States 8/24/2008 Swimming Silver 0
14 Michael Phelps United States 8/29/2004 Swimming Silver 0
15 Michael Phelps United States 8/12/2012 Swimming Silver 2
16 Natalie Coughlin United States 8/24/2008 Swimming Silver 2
17 Michael Phelps United States 8/24/2008 Swimming Bronze 0
18 Michael Phelps United States 8/29/2004 Swimming Bronze 2
19 Michael Phelps United States 8/12/2012 Swimming Bronze 0
20 Natalie Coughlin United States 8/24/2008 Swimming Bronze 3
但这不是我预期的输出。提前致谢
答案 0 :(得分:1)
我们可以使用melt
指定&#39; id&#39;列。根据预期的输出,从“宽”转换的列将被转换为&#39;长期&#39;形式是7:9即。金,银,铜柱。在我们转换为long
表单后,请移除“&#39; 0&#39;对于“价值”&#39; subset
library(reshape2)
subset(melt(data, id.var=1:6), value!=0)
# Athlete Age Country Year Closing.Ceremony.Date Sport
#1 Michael Phelps 23 United States 2008 8/24/2008 Swimming
#2 Michael Phelps 19 United States 2004 8/29/2004 Swimming
#3 Michael Phelps 27 United States 2012 8/12/2012 Swimming
#4 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
#7 Michael Phelps 27 United States 2012 8/12/2012 Swimming
#8 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
#10 Michael Phelps 19 United States 2004 8/29/2004 Swimming
#12 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# variable value
#1 Gold 8
#2 Gold 6
#3 Gold 4
#4 Gold 1
#7 Silver 2
#8 Silver 2
#10 Bronze 2
#12 Bronze 3
您还可以在value.name
中使用var.name
和melt
等参数来更改默认&#39;上面的variable/value
列。
或者可以使用gather
中的tidyr
来完成同样的工作。
library(dplyr)
library(tidyr)
gather(data, Type, MedalCount, 7:9) %>%
filter(MedalCount>0)
data <- structure(list(Athlete = c("Michael Phelps", "Michael Phelps",
"Michael Phelps", "Natalie Coughlin"), Age = c(23L, 19L, 27L,
25L), Country = c("United States", "United States", "United States",
"United States"), Year = c(2008L, 2004L, 2012L, 2008L),
Closing.Ceremony.Date = c("8/24/2008",
"8/29/2004", "8/12/2012", "8/24/2008"), Sport = c("Swimming",
"Swimming", "Swimming", "Swimming"), Gold = c(8L, 6L, 4L, 1L),
Silver = c(0L, 0L, 2L, 2L), Bronze = c(0L, 2L, 0L, 3L)),
.Names = c("Athlete",
"Age", "Country", "Year", "Closing.Ceremony.Date", "Sport", "Gold",
"Silver", "Bronze"), class = "data.frame", row.names = c(NA, -4L))
答案 1 :(得分:1)
尝试
subset(melt(df, varnames = c("Gold", "Silver", "Bronze"), id.vars = 1:8, value.name = "Count", varnames = "Metal"), Count > 0)
# Athlete Age Country Year Closing Ceremony Date Sport Metal Type Metal Count
# Michael Phelps 23 United States 2008 8/24/2008 Swimming Gold 8
# Michael Phelps 19 United States 2004 8/29/2004 Swimming Gold 6
# Michael Phelps 19 United States 2004 8/29/2004 Swimming Bronze 2
# Michael Phelps 27 United States 2012 8/12/2012 Swimming Gold 4
# Michael Phelps 27 United States 2012 8/12/2012 Swimming Silver 2
# Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Gold 1
# Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Silver 2
# Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Bronze 3
答案 2 :(得分:1)
对于多样性,这里是基础R的简洁方法。概念上它与使用melt
的方法相同(使用@ akrun的样本数据):
subset(cbind(data[1:6], stack(data[-c(1:6)])), values > 0)
# Athlete Age Country Year Closing.Ceremony.Date Sport
# 1 Michael Phelps 23 United States 2008 8/24/2008 Swimming
# 2 Michael Phelps 19 United States 2004 8/29/2004 Swimming
# 3 Michael Phelps 27 United States 2012 8/12/2012 Swimming
# 4 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# 7 Michael Phelps 27 United States 2012 8/12/2012 Swimming
# 8 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# 10 Michael Phelps 19 United States 2004 8/29/2004 Swimming
# 12 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# values ind
# 1 8 Gold
# 2 6 Gold
# 3 4 Gold
# 4 1 Gold
# 7 2 Silver
# 8 2 Silver
# 10 2 Bronze
# 12 3 Bronze
答案 3 :(得分:0)
尝试:
r> input <- data.frame(Athlete=c('Michael Phelps','Michael Phelps','Michael Phelps','Natalie Coughlin'), Age=c(23L,19L,27L,25L), Country=c('United States','United States','United States','United States'), Year=c(2008L,2004L,2012L,2008L), ClosingCeremonyDate=c(as.Date('2008-8-24'),as.Date('2004-8-29'),as.Date('2012-8-12'),as.Date('2008-8-24')), Sport=c('Swimming','Swimming','Swimming','Swimming'), Gold=c(8L,6L,4L,1L), Silver=c(0L,0L,2L,2L), Bronze=c(0L,2L,0L,3L) );
r> output <- subset(do.call(rbind, lapply(c('Bronze','Silver','Gold'), function(m) cbind(input[,-which(colnames(input)%in%c('Bronze','Silver','Gold'))], MedalType=m, MedalCount=input[,which(colnames(input)==m)] ) ) ), MedalCount>0 );
r> print(output, row.names=F );
Athlete Age Country Year ClosingCeremonyDate Sport MedalType MedalCount
Michael Phelps 19 United States 2004 2004-08-29 Swimming Bronze 2
Natalie Coughlin 25 United States 2008 2008-08-24 Swimming Bronze 3
Michael Phelps 27 United States 2012 2012-08-12 Swimming Silver 2
Natalie Coughlin 25 United States 2008 2008-08-24 Swimming Silver 2
Michael Phelps 23 United States 2008 2008-08-24 Swimming Gold 8
Michael Phelps 19 United States 2004 2004-08-29 Swimming Gold 6
Michael Phelps 27 United States 2012 2012-08-12 Swimming Gold 4
Natalie Coughlin 25 United States 2008 2008-08-24 Swimming Gold 1
如果您想要特定的订购,可以添加对order()
的通话。