一次重塑多个值

时间:2014-12-02 10:27:19

标签: r dataframe reshape reshape2 tidyr

我有一个很长的数据集我想扩大,我很好奇是否有办法在R中使用reshape2或tidyr包一步完成所有这些。

数据框df如下所示:

id  type    transactions    amount
20  income       20          100
20  expense      25          95
30  income       50          300
30  expense      45          250

我想谈谈这个问题:

id  income_transactions expense_transactions    income_amount   expense_amount
20       20                           25                 100             95
30       50                           45                 300             250

我知道我可以通过例如reshape2来获得部分路径:

dcast(df, id ~  type, value.var="transactions")

但有没有办法一次性重塑整个df,同时解决“交易”和“金额”变量?理想情况下,使用新的更合适的列名称?

2 个答案:

答案 0 :(得分:28)

在“reshape2”中,您可以使用recast(虽然根据我的经验,这不是一个广为人知的功能)。

library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
#   id transactions_expense transactions_income amount_expense amount_income
# 1 20                   25                  20             95           100
# 2 30                   45                  50            250           300

您还可以使用基础R reshape

reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
#   id transactions.income amount.income transactions.expense amount.expense
# 1 20                  20           100                   25             95
# 3 30                  50           300                   45            250

或者,您可以meltdcast,就像这样(此处带有“data.table”):

library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")), 
                 id ~ variable + type, value.var = "value")
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300

来自“data.table”(1.9.8)you will be able to do this directlydcast.data.table的更高版本。如果我理解正确的话,@ Arrun试图实现的是在不首先获得melt数据的情况下进行重新整形,这就是目前recast发生的事情,melt本质上是dcast的包装器。 1}} + tidyr操作序列。


而且,为了彻底,这是library(dplyr) library(tidyr) mydf %>% gather(var, val, transactions:amount) %>% unite(var2, type, var) %>% spread(var2, val) # id expense_amount expense_transactions income_amount income_transactions # 1 20 95 25 100 20 # 2 30 250 45 300 50 方法:

{{1}}

答案 1 :(得分:5)

使用data.table v1.9.6 +,我们可以同时投射多个value.var列(并且还在fun.aggregate中使用多个聚合函数)。有关详情,请参阅?dcast以及示例部分。

require(data.table) # v1.9.6+
dcast(dt, id ~ type, value.var=names(dt)[3:4])
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300