如果我在R中有两个数据框(我们分别称它们为df1
和df2
),例如
> df1
state num1
AL 22
AK 49
AZ 48
AR 25
和
> df2
state num2
AK 2
AZ 3
AR 4
CA 5
如何在减去值以形成类似
的值的同时聚合这些数据帧state num3
AL 22
AK 47
AZ 45
AR 21
CA -5
注意:数据框中的键值不相同;数据框具有不同的行数
答案 0 :(得分:3)
可能有一种更容易实现的方法,但这是一种可能性。我们可以merge()
两个数据框,然后在将NA
值替换为零后减去列。
m <- merge(df1, df2, all = TRUE)
cbind(m[1], num3 = with(replace(m, is.na(m), 0L), num1 - num2))
# state num3
# 1 AK 47
# 2 AL 22
# 3 AR 21
# 4 AZ 45
# 5 CA -5
数据:强>
df1 <- structure(list(state = structure(c(2L, 1L, 4L, 3L), .Label = c("AK",
"AL", "AR", "AZ"), class = "factor"), num1 = c(22L, 49L, 48L,
25L)), .Names = c("state", "num1"), row.names = c(NA, 4L), class = "data.frame")
df2 <- structure(list(state = structure(c(1L, 3L, 2L, 4L), .Label = c("AK",
"AR", "AZ", "CA"), class = "factor"), num2 = 2:5), .Names = c("state",
"num2"), row.names = 2:5, class = "data.frame")
答案 1 :(得分:2)
dplyr的一种方法如下。您将两个数据框与full_join
组合在一起。然后,用0替换NA。然后,处理减法,这在mutate()
部分完成。最后,使用select()
选择必要的列。
DATA
mydf1 <- structure(list(state = structure(c(2L, 1L, 4L, 3L), .Label = c("AK",
"AL", "AR", "AZ"), class = "factor"), num1 = c(22L, 49L, 48L,
25L)), .Names = c("state", "num1"), class = "data.frame", row.names = c(NA,
-4L))
mydf2 <- structure(list(state = structure(c(1L, 3L, 2L, 4L), .Label = c("AK",
"AR", "AZ", "CA"), class = "factor"), num2 = 2:5), .Names = c("state",
"num2"), class = "data.frame", row.names = c(NA, -4L))
CODE
full_join(mydf1, mydf2, by = c("state" = "state")) %>%
mutate_each(funs(replace(., which(. %in% NA), 0)), num1:num2) %>%
mutate(num3 = num1 - num2) %>%
select(state, num3)
# state num3
#1 AL 22
#2 AK 47
#3 AZ 45
#4 AR 21
#5 CA -5
答案 2 :(得分:0)
不是合并数据帧,而是组合行。首先,我们更改列num2
的符号,然后按州汇总结果:
基础套餐:
aggregate(num1 ~ state,
data = rbind(df1, setNames(data.frame(df2[1], -df2[2]), names(df1))),
FUN = sum)
输出:
state num1
1 AK 47
2 AL 22
3 AR 21
4 AZ 45
5 CA -5
dplyr:
library(dplyr)
rbind(df1, setNames(data.frame(df2[1], -df2[2]), names(df1))) %>%
group_by(state) %>%
summarise(sum = sum(num1))