当组合不存在时,r重构为空值

时间:2018-03-07 07:42:46

标签: r reshape tidyr reshape2

我使用versions:use-latest-releases包提供了一些数据meltdcast,如下所示。

reshape2

如何输出此输出,以使名称 - 月 - 产品组合不存在的情况(例如Alice,2,Bike)返回dat <- data.frame(Name = c("Alice", "Alice", "Alice", "Alice", "Bob", "Bob", "Bob"), Month = c(1, 1, 1, 2, 1, 2, 2), Product = c("Car", "Bike", "Car", "Car", "Car", "Bike", "Bike"), Price = c(1000, 150, 300, 500, 2000, 200, 100)) # Name Month Product Price # 1 Alice 1 Car 1000 # 2 Alice 1 Bike 150 # 3 Alice 1 Car 300 # 4 Alice 2 Car 500 # 5 Bob 1 Car 2000 # 6 Bob 2 Bike 200 # 7 Bob 2 Bike 100 dat_melt <- melt(dat, id=c("Name", "Month", "Product")) # Name Month Product variable value # 1 Alice 1 Car Price 1000 # 2 Alice 1 Bike Price 150 # 3 Alice 1 Car Price 300 # 4 Alice 2 Car Price 500 # 5 Bob 1 Car Price 2000 # 6 Bob 2 Bike Price 200 # 7 Bob 2 Bike Price 100 dat_spread <- dcast(dat_melt, Name + Month ~ Product + variable, value.var="value", fun=sum) # Name Month Bike_Price Car_Price # 1 Alice 1 150 1300 # 2 Alice 2 0 500 # 3 Bob 1 0 2000 # 4 Bob 2 300 0 NULL而非NA?注意,该解决方案应适用于0为0的情况,例如Price是不可接受的。

我尝试在dat_spread$BikePrice[BikePrice == 0] <- NA中使用匿名函数无济于事,例如

dcast

注意,library(dplyr) dcast(dat_melt, Name + Month ~ Product + variable, value.var="value", fun.aggregate = function(x) if_else(is.na(x), NULL, sum(x))) # Error: `false` must be type NULL, not double dcast(dat_melt, Name + Month ~ Product + variable, value.var="value", fun.aggregate = function(x) if_else(is.na(x), 3.14, sum(x))) # then update after # Error in vapply(indices, fun, .default) : values must be length 0, # but FUN(X[[1]]) result is length 1 不是必需的,所以如果你有一个不使用它的解决方案(例如使用reshape2函数),那也很棒。

2 个答案:

答案 0 :(得分:2)

您可以使用dcast参数指定fill中缺少的组合使用的值:

dcast(dat_melt, Name + Month ~ Product + variable,
      value.var = "value", fun = sum, fill = NA_real_)
#>    Name Month Bike_Price Car_Price
#> 1 Alice     1        150      1300
#> 2 Alice     2         NA       500
#> 3   Bob     1         NA      2000
#> 4   Bob     2        300        NA

reprex package(v0.2.0)创建于2018-03-07。

(请注意,dcast调用了vapply fill = NA,这对于类型非常挑剔;因此仅指定typeof(NA) == "logical"不够好,因为NA_real_和您的值是数字:您必须明确使用带有{{1}})

的“双”NA

答案 1 :(得分:1)

作为替代方案:您还可以使用dplyr + tidyr完成所有重塑:

library(dplyr);
library(tidyr);
dat %>%
    group_by(Name, Month, Product) %>%
    summarise(Price = sum(Price)) %>%
    spread(Product, Price);
## A tibble: 4 x 4
## Groups:   Name, Month [4]
#  Name  Month  Bike   Car
#  <fct> <dbl> <dbl> <dbl>
#1 Alice    1.  150. 1300.
#2 Alice    2.   NA   500.
#3 Bob      1.   NA  2000.
#4 Bob      2.  300.   NA

dcast类似,spread有一个fill参数,默认为fill=NA