在R中减去值时聚合

时间:2015-08-22 23:16:49

标签: r

如果我在R中有两个数据框(我们分别称它们为df1df2),例如

> df1
state num1
   AL 22
   AK 49
   AZ 48
   AR 25

> df2
state num2
   AK 2
   AZ 3
   AR 4
   CA 5

如何在减去值以形成类似

的值的同时聚合这些数据帧
state num3
   AL 22
   AK 47
   AZ 45
   AR 21
   CA -5

注意:数据框中的键值不相同;数据框具有不同的行数

3 个答案:

答案 0 :(得分:3)

可能有一种更容易实现的方法,但这是一种可能性。我们可以merge()两个数据框,然后在将NA值替换为零后减去列。

m <- merge(df1, df2, all = TRUE)
cbind(m[1], num3 = with(replace(m, is.na(m), 0L), num1 - num2))
#   state num3
# 1    AK   47
# 2    AL   22
# 3    AR   21
# 4    AZ   45
# 5    CA   -5

数据:

df1 <- structure(list(state = structure(c(2L, 1L, 4L, 3L), .Label = c("AK", 
"AL", "AR", "AZ"), class = "factor"), num1 = c(22L, 49L, 48L, 
25L)), .Names = c("state", "num1"), row.names = c(NA, 4L), class = "data.frame")

df2 <- structure(list(state = structure(c(1L, 3L, 2L, 4L), .Label = c("AK", 
"AR", "AZ", "CA"), class = "factor"), num2 = 2:5), .Names = c("state", 
"num2"), row.names = 2:5, class = "data.frame")

答案 1 :(得分:2)

dplyr的一种方法如下。您将两个数据框与full_join组合在一起。然后,用0替换NA。然后,处理减法,这在mutate()部分完成。最后,使用select()选择必要的列。

DATA

mydf1 <- structure(list(state = structure(c(2L, 1L, 4L, 3L), .Label = c("AK", 
"AL", "AR", "AZ"), class = "factor"), num1 = c(22L, 49L, 48L, 
25L)), .Names = c("state", "num1"), class = "data.frame", row.names = c(NA, 
-4L))

mydf2 <- structure(list(state = structure(c(1L, 3L, 2L, 4L), .Label = c("AK", 
"AR", "AZ", "CA"), class = "factor"), num2 = 2:5), .Names = c("state", 
"num2"), class = "data.frame", row.names = c(NA, -4L))

CODE

full_join(mydf1, mydf2, by = c("state" = "state")) %>%
mutate_each(funs(replace(., which(. %in% NA), 0)), num1:num2) %>%
mutate(num3 = num1 - num2) %>%
select(state, num3)

#  state num3
#1    AL   22
#2    AK   47
#3    AZ   45
#4    AR   21
#5    CA   -5

答案 2 :(得分:0)

不是合并数据帧,而是组合行。首先,我们更改列num2的符号,然后按州汇总结果:

基础套餐:

aggregate(num1 ~ state, 
          data = rbind(df1, setNames(data.frame(df2[1], -df2[2]), names(df1))), 
          FUN = sum)

输出:

  state num1
1    AK   47
2    AL   22
3    AR   21
4    AZ   45
5    CA   -5

dplyr:

library(dplyr)
rbind(df1, setNames(data.frame(df2[1], -df2[2]), names(df1))) %>% 
  group_by(state) %>% 
  summarise(sum = sum(num1))