将转换插入为新行而不是列

时间:2016-03-04 12:36:47

标签: r dplyr

示例数据:

Date <- as.Date(c('1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01-    2008', '3-01-2008','1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01-2008', '3-01-2008'), format = "%m-%d-%Y") 
Country <- c('US', 'US','US', 'JP', 'JP', 'JP', 'US', 'US','US', 'JP', 'JP', 'JP') 
Category <- c('Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'foo', 'foo','foo', 'foo','foo', 'foo') 
Value <- c(runif(12, -0.5, 10))
df <- data.frame(Date, Country, Category, Value)

我想要做的是减去每个月和每个国家/地区内Apple和foo的值(因此对于US和2008-01-01,该值将为-1.2357797)。但是我希望将结果插入行,并使用类别名称,例如“差异”。

我想出了如何使用dplyr / mutate执行此操作,但只能插入一个全新的,在这种情况下,表格不再有意义(因为类别不适合我稍后将转换为列表):

df <- df %>%
 group_by(Country, Date) %>%
    mutate(
      diff = Value[Category=="Apple"] - Value[Category=="foo"])

编辑:重要提示:我想以这种方式添加多个转换,而不仅仅是示例中提到的差异。

Edit2:感谢所有有用的回复。使用@akrun建议的tidyr / dplyr方法,我将使用它来插入更多的转换:

library(tidyr)
library(dplyr)
 df <- spread(df, Category, Value) %>%
    mutate(diff=Apple- foo, xyz = Apple+foo) %>% 
    gather(Category, Value, Apple:diff, Apple:xyz)

1 个答案:

答案 0 :(得分:3)

我们可以使用data.tabledplyr。使用data.table,将'data.frame'转换为'data.table'(setDT(df)),按'日期','国家/地区'进行转换,我们得到相应'Apple'的'值'差异“类别”列中的'和'foo'值,并且还将'diff'创建为'Category'作为值。这将是一个新的汇总数据集('dfN'),我们可以使用rbind rbindlist使用原始数据集。如果需要,我们可以order按'日期','国家'

library(data.table)
dfN <- setDT(df)[, 
    list(Category="diff",
         Value=Value[Category=="Apple"]- Value[Category=="foo"]), 
               by = .(Date, Country)]
rbindlist(list(df, dfN))[order(Date,Country)]
#       Date Country Category      Value
# 1: 2008-01-01      JP    Apple  9.8861949
# 2: 2008-01-01      JP      foo  6.8009149
# 3: 2008-01-01      JP     diff  3.0852799
# 4: 2008-01-01      US    Apple -0.3047560
# 5: 2008-01-01      US      foo  9.1748432
# 6: 2008-01-01      US     diff -9.4795991
# 7: 2008-02-01      JP    Apple  8.7836616
# 8: 2008-02-01      JP      foo  5.4775849
# 9: 2008-02-01      JP     diff  3.3060767
#10: 2008-02-01      US    Apple  1.6155057
#11: 2008-02-01      US      foo  3.6720346
#12: 2008-02-01      US     diff -2.0565289
#13: 2008-03-01      JP    Apple  1.9879906
#14: 2008-03-01      JP      foo  7.1387297
#15: 2008-03-01      JP     diff -5.1507391
#16: 2008-03-01      US    Apple  1.1435151
#17: 2008-03-01      US      foo  0.6596238
#18: 2008-03-01      US     diff  0.4838913

或另一种选择是使用dcast/melt

中的data.table
melt(dcast(setDT(df), Date+Country~Category, 
    value.var='Value')[, diff:= Apple-foo], 
      id.var=c('Date', 'Country'))

如果我们使用gather/spread中的melt/dcast选项(类似于tidyr),

library(tidyr)
library(dplyr)
spread(df, Category, Value) %>%
       mutate(diff=Apple- foo) %>% 
       gather(Category, Value, Apple:diff)

或者使用dplyr,我们使用相同的技术,而不是rbindlist,我们使用bind_rows

 library(dplyr)
 df %>%
     group_by(Country, Date) %>%
     summarise(Value =  Value[Category=="Apple"] - 
                        Value[Category=="foo"],
                Category= "diff") %>%
     bind_rows(df, .) %>%
     arrange(Date, Country)
#        Date Country Category      Value
#       (date)  (fctr)    (chr)      (dbl)
#1  2008-01-01      JP    Apple  9.8861949
#2  2008-01-01      JP      foo  6.8009149
#3  2008-01-01      JP     diff  3.0852799
#4  2008-01-01      US    Apple -0.3047560
#5  2008-01-01      US      foo  9.1748432
#6  2008-01-01      US     diff -9.4795991
#7  2008-02-01      JP    Apple  8.7836616
#8  2008-02-01      JP      foo  5.4775849
#9  2008-02-01      JP     diff  3.3060767
#10 2008-02-01      US    Apple  1.6155057
#11 2008-02-01      US      foo  3.6720346
#12 2008-02-01      US     diff -2.0565289
#13 2008-03-01      JP    Apple  1.9879906
#14 2008-03-01      JP      foo  7.1387297
#15 2008-03-01      JP     diff -5.1507391
#16 2008-03-01      US    Apple  1.1435151
#17 2008-03-01      US      foo  0.6596238
#18 2008-03-01      US     diff  0.4838913