示例数据:
Date <- as.Date(c('1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01- 2008', '3-01-2008','1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01-2008', '3-01-2008'), format = "%m-%d-%Y")
Country <- c('US', 'US','US', 'JP', 'JP', 'JP', 'US', 'US','US', 'JP', 'JP', 'JP')
Category <- c('Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'foo', 'foo','foo', 'foo','foo', 'foo')
Value <- c(runif(12, -0.5, 10))
df <- data.frame(Date, Country, Category, Value)
我想要做的是减去每个月和每个国家/地区内Apple和foo的值(因此对于US和2008-01-01,该值将为-1.2357797)。但是我希望将结果插入行,并使用类别名称,例如“差异”。
我想出了如何使用dplyr / mutate执行此操作,但只能插入一个全新的列,在这种情况下,表格不再有意义(因为类别不适合我稍后将转换为列表):
df <- df %>%
group_by(Country, Date) %>%
mutate(
diff = Value[Category=="Apple"] - Value[Category=="foo"])
编辑:重要提示:我想以这种方式添加多个转换,而不仅仅是示例中提到的差异。
Edit2:感谢所有有用的回复。使用@akrun建议的tidyr / dplyr方法,我将使用它来插入更多的转换:
library(tidyr)
library(dplyr)
df <- spread(df, Category, Value) %>%
mutate(diff=Apple- foo, xyz = Apple+foo) %>%
gather(Category, Value, Apple:diff, Apple:xyz)
答案 0 :(得分:3)
我们可以使用data.table
或dplyr
。使用data.table
,将'data.frame'转换为'data.table'(setDT(df)
),按'日期','国家/地区'进行转换,我们得到相应'Apple'的'值'差异“类别”列中的'和'foo'值,并且还将'diff'创建为'Category'作为值。这将是一个新的汇总数据集('dfN'),我们可以使用rbind
rbindlist
使用原始数据集。如果需要,我们可以order
按'日期','国家'
library(data.table)
dfN <- setDT(df)[,
list(Category="diff",
Value=Value[Category=="Apple"]- Value[Category=="foo"]),
by = .(Date, Country)]
rbindlist(list(df, dfN))[order(Date,Country)]
# Date Country Category Value
# 1: 2008-01-01 JP Apple 9.8861949
# 2: 2008-01-01 JP foo 6.8009149
# 3: 2008-01-01 JP diff 3.0852799
# 4: 2008-01-01 US Apple -0.3047560
# 5: 2008-01-01 US foo 9.1748432
# 6: 2008-01-01 US diff -9.4795991
# 7: 2008-02-01 JP Apple 8.7836616
# 8: 2008-02-01 JP foo 5.4775849
# 9: 2008-02-01 JP diff 3.3060767
#10: 2008-02-01 US Apple 1.6155057
#11: 2008-02-01 US foo 3.6720346
#12: 2008-02-01 US diff -2.0565289
#13: 2008-03-01 JP Apple 1.9879906
#14: 2008-03-01 JP foo 7.1387297
#15: 2008-03-01 JP diff -5.1507391
#16: 2008-03-01 US Apple 1.1435151
#17: 2008-03-01 US foo 0.6596238
#18: 2008-03-01 US diff 0.4838913
或另一种选择是使用dcast/melt
data.table
melt(dcast(setDT(df), Date+Country~Category,
value.var='Value')[, diff:= Apple-foo],
id.var=c('Date', 'Country'))
如果我们使用gather/spread
中的melt/dcast
选项(类似于tidyr
),
library(tidyr)
library(dplyr)
spread(df, Category, Value) %>%
mutate(diff=Apple- foo) %>%
gather(Category, Value, Apple:diff)
或者使用dplyr
,我们使用相同的技术,而不是rbindlist
,我们使用bind_rows
。
library(dplyr)
df %>%
group_by(Country, Date) %>%
summarise(Value = Value[Category=="Apple"] -
Value[Category=="foo"],
Category= "diff") %>%
bind_rows(df, .) %>%
arrange(Date, Country)
# Date Country Category Value
# (date) (fctr) (chr) (dbl)
#1 2008-01-01 JP Apple 9.8861949
#2 2008-01-01 JP foo 6.8009149
#3 2008-01-01 JP diff 3.0852799
#4 2008-01-01 US Apple -0.3047560
#5 2008-01-01 US foo 9.1748432
#6 2008-01-01 US diff -9.4795991
#7 2008-02-01 JP Apple 8.7836616
#8 2008-02-01 JP foo 5.4775849
#9 2008-02-01 JP diff 3.3060767
#10 2008-02-01 US Apple 1.6155057
#11 2008-02-01 US foo 3.6720346
#12 2008-02-01 US diff -2.0565289
#13 2008-03-01 JP Apple 1.9879906
#14 2008-03-01 JP foo 7.1387297
#15 2008-03-01 JP diff -5.1507391
#16 2008-03-01 US Apple 1.1435151
#17 2008-03-01 US foo 0.6596238
#18 2008-03-01 US diff 0.4838913