R:根据另一列

时间:2016-10-24 12:37:03

标签: r

我有一个数据集如下:

Group    Type   Income
 A        X       1000
 A        Y       500
 B        Y       2000
 B        X       1500
 C        X       700
 D        Y       600

我需要输出如下:

Group    Diff
  A       500
  B      -500
  C       700
  D      -600

我能想到的一种方法是将数据与类型X和Y分开,然后为不存在X或Y的组添加收入0,然后合并数据,例如每个组,有一个名为IncomeX的列另一个名为IncomeY,然后减去两列。

有更简单的方法吗?

3 个答案:

答案 0 :(得分:4)

我会这样做:(使用myFuncdplyr包)

reshape2

library("dplyr") library("reshape2") t <- read.table(text = "Group Type Income A X 1000 A Y 500 B Y 2000 B X 1500 C X 700 D Y 600", header = TRUE) t %>% dcast(Group ~ Type, value.var = "Income", fill = 0) %>% mutate(Diff = X - Y) %>% select(Group, Diff) # Group Diff # 1 A 500 # 2 B -500 # 3 C 700 # 4 D -600 更改了表的格式,dcast创建了新列。

答案 1 :(得分:3)

在基础R中尝试:

aggregate(Diff~Group, 
          with(df, data.frame(Group=Group, Diff=ifelse(Type=="X", 1, -1)*Income)), sum)

# Group Diff
#1     A    500
#2     B   -500
#3     C    700
#4     D   -600

数据

df <- structure(list(Group = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label = c("A", 
"B", "C", "D"), class = "factor"), Type = structure(c(1L, 2L, 
2L, 1L, 1L, 2L), .Label = c("X", "Y"), class = "factor"), Income = c(1000L, 
500L, 2000L, 1500L, 700L, 600L)), .Names = c("Group", "Type", 
"Income"), class = "data.frame", row.names = c(NA, -6L))

答案 2 :(得分:1)

我们可以使用data.table。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)),对于&#39; Type&#39;那是&#39; Y&#39;,转换&#39;收入&#39;到负值,然后按&#39;分组&#39;,得到&#39;收入的sum

library(data.table)
setDT(df1)[Type == "Y", Income := -1 * Income][, .(Diff= sum(Income))  , Group] 
#   Group Diff
#1:     A  500
#2:     B -500
#3:     C  700
#4:     D -600

tidyr/dplyr

library(dplyr)
library(tidyr)
spread(df1, Type, Income, fill = 0) %>%
                transmute(Group, Diff = X- Y)
#    Group Diff
#1     A  500
#2     B -500
#3     C  700
#4     D -600