R中所有组data.frame的数据标准化

时间:2014-07-31 08:28:17

标签: r

我有一个数据集如下

Date <- rep(c("Jan", "Feb"), 3)[1:5]
Group <- c(rep(letters[1:2],each=2),"c")
value <- sample(1:10,5)
data <- data.frame(Date, Group, value)

> data
  Date Group value
1  Jan     a     2
2  Feb     a     7
3  Jan     b     3
4  Feb     b     9
5  Jan     c     1

正如您所观察到的,对于组c,它没有Date = Feb的数据。 如何制作数据集

> DATA
  Date Group value
1  Jan     a     2
2  Feb     a     7
3  Jan     b     3
4  Feb     b     9
5  Jan     c     1
6  Feb     c     0 

我添加了最后一行,使得feb中c组的值为0。

由于

3 个答案:

答案 0 :(得分:3)

使用基数R,您可以使用xtabs中包含的as.data.frame

as.data.frame(xtabs(formula = value ~ Date + Group, data = data))
#  Date Group Freq
#1  Feb     a    8
#2  Jan     a    6
#3  Feb     b    4
#4  Jan     b    1
#5  Feb     c    0
#6  Jan     c   10

答案 1 :(得分:2)

使用合并:

#get all combinations of 2 columns
all.comb <- expand.grid(unique(data$Date),unique(data$Group))
colnames(all.comb) <- c("Date","Group")

#merge with all.x=TRUE to keep nonmatched rows
res <- merge(all.comb,data,all.x=TRUE)

#convert NA to 0
res$value[is.na(res$value)] <- 0

#result
res
# Date Group value
# 1  Feb     a     3
# 2  Feb     b     4
# 3  Feb     c     0
# 4  Jan     a     5
# 5  Jan     b     7
# 6  Jan     c    10

答案 2 :(得分:1)

使用reshape2

library(reshape2)     
melt(dcast(data, Date~Group, value.var="value",fill=0), id.var="Date") #values differ as there was no set.seed()
#   Date variable value
#1  Feb        a     1
#2  Jan        a    10
#3  Feb        b     7
#4  Jan        b     4
#5  Feb        c     0
#6  Jan        c     5

或使用dplyr

 library(dplyr)
 library(tidyr)
  data%>%
  spread(Group, value, fill=0) %>% 
  gather(Group, value, a:c)
 #  Date Group value
 #1  Feb     a     1
 #2  Jan     a    10
 #3  Feb     b     7
 #4  Jan     b     4
 #5  Feb     c     0
 #6  Jan     c     5