我有两个数据框:
> df1
Long Short
EURUSD 47295 16057
GBPUSD 17385 6861
USDJPY 7146 9369
USDCHF 2704 5162
USDCAD 4705 11947
AUDUSD 13041 6654
NZDUSD 7184 4000
> df2
Long Short
EURUSD 318 408
GBPUSD 181 276
USDJPY 217 203
USDCHF 97 57
USDCAD 178 121
AUDUSD 142 202
NZDUSD 95 138
我需要最终的数据框如下:
> Final
Long Short
EURUSD 47613 16465
... ... ...
NZDUSD 7279 4138
合并/连接方法不起作用。我感谢任何帮助。
答案 0 :(得分:1)
如果不的数据有行名(我的个人偏好,并不总是可控),这里有三种方法。
您的数据:
df1 <- read.table(text = "Symbol Long Short
EURUSD 47295 16057
GBPUSD 17385 6861
USDJPY 7146 9369
USDCHF 2704 5162
USDCAD 4705 11947
AUDUSD 13041 6654
NZDUSD 7184 4000", header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = "Symbol Long Short
EURUSD 318 408
GBPUSD 181 276
USDJPY 217 203
USDCHF 97 57
USDCAD 178 121
AUDUSD 142 202
NZDUSD 95 138", header = TRUE, stringsAsFactors = FALSE)
方法2和3使用的单个辅助函数:
psum <- function(..., na.rm = FALSE) rowSums(sapply(list(...), c), na.rm = na.rm)
(这类似于pmin
and family,并且需要NA
不会使人衰弱......)
cbind
这是@Leo P。的评论,依赖于两个data.frames总是具有完全相同的行顺序:
cbind(df1[,1,drop=FALSE], df1[,2:3] + df2[,2:3])
# Symbol Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566 7137
# 3 USDJPY 7363 9572
# 4 USDCHF 2801 5219
# 5 USDCAD 4883 12068
# 6 AUDUSD 13183 6856
# 7 NZDUSD 7279 4138
此方法不依赖于两者中的有序甚至存在行。为了证明这是有效的,我将从其中一个数据帧中删除一行:
df2 <- df2[-3,]
重命名第二帧的列,以便我们可以将它们合并并保留数据:
colnames(df2) <- c("Symbol", "Long2", "Short2")
实际工作:
colnames(df2) <- c("Symbol", "Long2", "Short2")
within(merge(df1, df2, by = "Symbol", all = TRUE), {
Long <- psum(Long, Long2, na.rm = TRUE)
Short <- psum(Short, Short2, na.rm = TRUE)
# cleanup, remove unneeded columns
Long2 <- Short2 <- NULL
})
# Symbol Long Short
# 1 AUDUSD 13183 6856
# 2 EURUSD 47613 16465
# 3 GBPUSD 17566 7137
# 4 NZDUSD 7279 4138
# 5 USDCAD 4883 12068
# 6 USDCHF 2801 5219
# 7 USDJPY 7146 9369
dplyr
加入从新鲜 df1
和df2
(完整的原始名称)开始,我再次删除一行:
df2 <- df2[-3,]
工作:
library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
mutate(
Long = psum(Long, Long2, na.rm = TRUE),
Short = psum(Short, Short2, na.rm = TRUE)
) %>%
select(-Long2, -Short2)
# Symbol Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566 7137
# 3 USDJPY 7146 9369
# 4 USDCHF 2801 5219
# 5 USDCAD 4883 12068
# 6 AUDUSD 13183 6856
# 7 NZDUSD 7279 4138
您问题中的数据不具代表性。根据您的评论,您真正所拥有的内容似乎是:
str(df1)
# 'data.frame': 7 obs. of 2 variables:
# $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
# $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1
(如果您以明确的消费形式提供数据,将来会更清楚,例如:
# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
.Names = c("Long", "Short"),
row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
class = "data.frame")
要从df1
转到上面的内容,请执行以下操作:
# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)
列将采用不同的顺序,但如果重要的话,可以轻松解决这个问题。您可以选择使用rownames(df1) <- NULL
删除行名称。这也需要df2
。