在R中添加和合并两个数据帧

时间:2017-08-02 13:13:15

标签: r dataframe

我有两个数据框:

> df1
       Long Short
EURUSD 47295 16057
GBPUSD 17385  6861
USDJPY  7146  9369
USDCHF  2704  5162
USDCAD  4705 11947
AUDUSD 13041  6654
NZDUSD  7184  4000

> df2
       Long Short
EURUSD  318    408
GBPUSD  181    276
USDJPY  217    203
USDCHF   97     57
USDCAD  178    121
AUDUSD  142    202
NZDUSD   95    138

我需要最终的数据框如下:

> Final
       Long   Short
EURUSD 47613   16465

...    ...     ...

NZDUSD 7279    4138

合并/连接方法不起作用。我感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

如果的数据有行名(我的个人偏好,并不总是可控),这里有三种方法。

您的数据:

df1 <- read.table(text = "Symbol Long Short
EURUSD 47295 16057
GBPUSD 17385  6861
USDJPY  7146  9369
USDCHF  2704  5162
USDCAD  4705 11947
AUDUSD 13041  6654
NZDUSD  7184  4000", header = TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text = "Symbol Long Short
EURUSD  318    408
GBPUSD  181    276
USDJPY  217    203
USDCHF   97     57
USDCAD  178    121
AUDUSD  142    202
NZDUSD   95    138", header = TRUE, stringsAsFactors = FALSE)

方法2和3使用的单个辅助函数:

psum <- function(..., na.rm = FALSE) rowSums(sapply(list(...), c), na.rm = na.rm)

(这类似于pmin and family,并且需要NA不会使人衰弱......)

方法1:cbind

这是@Leo P。的评论,依赖于两个data.frames总是具有完全相同的行顺序:

cbind(df1[,1,drop=FALSE], df1[,2:3] + df2[,2:3])
#   Symbol  Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566  7137
# 3 USDJPY  7363  9572
# 4 USDCHF  2801  5219
# 5 USDCAD  4883 12068
# 6 AUDUSD 13183  6856
# 7 NZDUSD  7279  4138

方法2:基础R合并

此方法不依赖于两者中的有序甚至存在行。为了证明这是有效的,我将从其中一个数据帧中删除一行:

df2 <- df2[-3,]

重命名第二帧的列,以便我们可以将它们合并并保留数据:

colnames(df2) <- c("Symbol", "Long2", "Short2")

实际工作:

colnames(df2) <- c("Symbol", "Long2", "Short2")
within(merge(df1, df2, by = "Symbol", all = TRUE), {
  Long <- psum(Long, Long2, na.rm = TRUE)
  Short <- psum(Short, Short2, na.rm = TRUE)
  # cleanup, remove unneeded columns
  Long2 <- Short2 <- NULL
})
#   Symbol  Long Short
# 1 AUDUSD 13183  6856
# 2 EURUSD 47613 16465
# 3 GBPUSD 17566  7137
# 4 NZDUSD  7279  4138
# 5 USDCAD  4883 12068
# 6 USDCHF  2801  5219
# 7 USDJPY  7146  9369

方法3:dplyr加入

新鲜 df1df2(完整的原始名称)开始,我再次删除一行:

df2 <- df2[-3,]

工作:

library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
  mutate(
    Long = psum(Long, Long2, na.rm = TRUE),
    Short = psum(Short, Short2, na.rm = TRUE)
  ) %>%
  select(-Long2, -Short2)
#   Symbol  Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566  7137
# 3 USDJPY  7146  9369
# 4 USDCHF  2801  5219
# 5 USDCAD  4883 12068
# 6 AUDUSD 13183  6856
# 7 NZDUSD  7279  4138

修改

您问题中的数据不具代表性。根据您的评论,您真正所拥有的内容似乎是:

str(df1)
# 'data.frame': 7 obs. of  2 variables:
#  $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
#  $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1

(如果您以明确的消费形式提供数据,将来会更清楚,例如:

# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
  Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
  Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
  .Names = c("Long", "Short"),
  row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
  class = "data.frame")

要从df1转到上面的内容,请执行以下操作:

# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)

列将采用不同的顺序,但如果重要的话,可以轻松解决这个问题。您可以选择使用rownames(df1) <- NULL删除行名称。这也需要df2