对R中两个数据集中具有相同名称的列的操作

时间:2017-10-23 04:21:40

标签: r

我有两个数据集如下:

df1 <- data.frame(a =c(1), b=c(4), c=c(1))

df2 <- data.frame (b =c(4), c=c(1), a=c(4))

我希望对具有相同名称的列进行操作:例如对于列a,我想要执行此操作(a(在df1中) - a(在df2中)/ a(在df1中) )

(1-4) /1 = -3

所以我理想的输出是:

 a b c
-3 0 0

我会尝试编写如下函数但不确定如何继续。任何小费都非常感谢。

my_func <- function(x,y) {
  for (i in names(x)) 
    if ((i %in% names(y))) {
      df3 [i,] <- (x[i,] - y[i,]) / x[i,]
    }
}

更新

理想情况下,如果我可以考虑缺少的列,那就太好了。例如,如果df2中缺少列,我希望有0,如果缺少的列在df2中,则输出可以是“无值”

2 个答案:

答案 0 :(得分:1)

从@Headpoint答案中获取帮助,您可以直接执行此操作(无需循环)

df1 <- data.frame(a =c(1), b=c(4), c=c(1))
df1 <- df1[, order(names(df1))]

df2 <- data.frame (b =c(4), c=c(1), a=c(4))
df2 <- df2[, order(names(df2))]

all_col_names <- unique(colnames(df1),  colnames(df2))

df1[, all_col_names] - df2[, all_col_names]

   a b c
1 -3 0 0

答案 1 :(得分:0)

这就是你想要的吗?

res <- NULL
for (str in colnames(df1))
  res <- c(res, (df1[str] - df2[str]) / df1[str] )

res
#$a
#[1] -3

#$b
#[1] 0

#$c
#[1] 0

如果您希望将其设为数字​​

out <- as.numeric(res)
names(out) <- names(res)
out
# a  b  c 
#-3  0  0 

如果列不匹配...

col_nam1 <- colnames(df1)
col_nam2 <- colnames(df2)
all_col_names <- unique(c(col_nam1, col_nam2))
res <- NULL
for (str in all_col_names)
  if ((str %in% col_nam1) && (str %in%col_nam2))
    res <- c(res, (df1[str] - df2[str]) / df1[str])

受到@Hardik gupta的启发,没有循环:

common_names <- sort(intersect(col_nam2, col_nam1))
(df1[, common_names] - df2[, common_names]) / df1[, common_names]

如果df1的列df2

,则为零
df1 <- data.frame(a =c(1), b=c(4), c=c(1), f = 4)
df2 <- data.frame (b =c(4), c=c(1), a=c(4), g = 5)

col_nam1 <- colnames(df1)
col_nam2 <- colnames(df2)

common_names <- intersect(col_nam2, col_nam1)
col_names <- sort(unique(col_nam1, common_names))
res <- numeric(length(col_names))
names(res) <- col_names

res[common_names] <- (df1[, common_names] - df2[, common_names]) / 
                                       df1[, common_names]
out <- as.numeric(res)
names(out) <- names(res)
out
 a  b  c  f 
-3  0  0  0