R从三列中创建表格

时间:2018-01-12 14:43:26

标签: r aggregate

希望我能够很好地解释我的问题; 所以,我有一个这样的数据框:

sample = data.frame("Room" = c("A1", "B2","A1","A3","A2"), "Name"=c("Peter","Tom","Peter","Anna","Peter"), "Class"=c("E","E","F","D","E"), "FY"=c(1,2,3,4,6))

现在我想创建一个新的数据框,它只有列“Class”列的唯一值作为列名,列“Room”和“Name”作为列。这些值应该是“FY”的总和。

如果没有“Room”栏,我会这样做:

test=as.data.frame(unclass(with(sample, tapply(FY, list(Name, Class), FUN=sum))))

但我怎么能用两列做到这一点呢?

这是我想要的输出:

output = data.frame(c("A1", "B2","A3", "A2"), c("Peter","Tom","Anna","Peter"), c(1,2,NA,6),c(3,NA,NA,NA),c(NA,NA,4,NA))
colnames(output) = c("Room", "Name","E","F","D")

3 个答案:

答案 0 :(得分:2)

使用dcast的简单reshape2应该可以解决问题:

library(reshape2)
dcast(sample, Room + Name ~ Class, value.var = "FY")

#  Room  Name  D  E  F
#1   A1 Peter NA  1  3
#2   A2 Peter NA  6 NA
#3   A3  Anna  4 NA NA
#4   B2   Tom NA  2 NA

答案 1 :(得分:1)

在这个具体的例子中,我相信你可以逃脱以下几点:

library(tidyverse)

sample %>%
  spread(Class, FY)

但是,根据您的描述,我认为这应该涵盖更广泛的例子:

sample %>%
  group_by(Room, Name, Class) %>% 
  summarise(FY = sum(FY)) %>% 
  spread(Class, FY)

答案 2 :(得分:0)

使用reshape

reshape(sample, idvar = c("Name", "Room"), timevar = "Class", direction = "wide")

输出:

    Room  Name FY.E FY.F FY.D
1   A1 Peter    1    3   NA
2   B2   Tom    2   NA   NA
4   A3  Anna   NA   NA    4
5   A2 Peter    6   NA   NA

如果需要,可以在以后更改列名称。