我有一个数据集如下:
Group Type Income
A X 1000
A Y 500
B Y 2000
B X 1500
C X 700
D Y 600
我需要输出如下:
Group Diff
A 500
B -500
C 700
D -600
我能想到的一种方法是将数据与类型X和Y分开,然后为不存在X或Y的组添加收入0,然后合并数据,例如每个组,有一个名为IncomeX的列另一个名为IncomeY,然后减去两列。
有更简单的方法吗?
答案 0 :(得分:4)
我会这样做:(使用myFunc
和dplyr
包)
reshape2
library("dplyr")
library("reshape2")
t <- read.table(text = "Group Type Income
A X 1000
A Y 500
B Y 2000
B X 1500
C X 700
D Y 600", header = TRUE)
t %>%
dcast(Group ~ Type, value.var = "Income", fill = 0) %>%
mutate(Diff = X - Y) %>%
select(Group, Diff)
# Group Diff
# 1 A 500
# 2 B -500
# 3 C 700
# 4 D -600
更改了表的格式,dcast
创建了新列。
答案 1 :(得分:3)
在基础R中尝试:
aggregate(Diff~Group,
with(df, data.frame(Group=Group, Diff=ifelse(Type=="X", 1, -1)*Income)), sum)
# Group Diff
#1 A 500
#2 B -500
#3 C 700
#4 D -600
数据强>
df <- structure(list(Group = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label = c("A",
"B", "C", "D"), class = "factor"), Type = structure(c(1L, 2L,
2L, 1L, 1L, 2L), .Label = c("X", "Y"), class = "factor"), Income = c(1000L,
500L, 2000L, 1500L, 700L, 600L)), .Names = c("Group", "Type",
"Income"), class = "data.frame", row.names = c(NA, -6L))
答案 2 :(得分:1)
我们可以使用data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)
),对于&#39; Type&#39;那是&#39; Y&#39;,转换&#39;收入&#39;到负值,然后按&#39;分组&#39;,得到&#39;收入的sum
。
library(data.table)
setDT(df1)[Type == "Y", Income := -1 * Income][, .(Diff= sum(Income)) , Group]
# Group Diff
#1: A 500
#2: B -500
#3: C 700
#4: D -600
或tidyr/dplyr
library(dplyr)
library(tidyr)
spread(df1, Type, Income, fill = 0) %>%
transmute(Group, Diff = X- Y)
# Group Diff
#1 A 500
#2 B -500
#3 C 700
#4 D -600