如果变量与其他变量匹配,则删除它们的值

时间:2014-11-06 20:08:09

标签: r merge dataframe

我有一个这样的数据框:

Family   Component   x1   m_x1   x2   m_x2   x3   m_x3   y1   m_y1   y2   m_y2   y3   m_y3
a1       1           1    100    2    300    0    0      2    250    0    0      0    0
a1       2           1    100    2    300    0    0      2    250    0    0      0    01
a1       3           1    100    2    300    0    0      2    250    0    0      0    0
a2       1           2    150    0    0      0    0      0    0      0    0      0    0
a2       2           2    150    0    0      0    0      0    0      0    0      0    0
a3       1           1    4000   3    150    4    130    2    150    3    400    0    0
a3       2           1    4000   3    150    4    130    2    150    3    400    0    0
a3       3           1    4000   3    150    4    130    2    150    3    400    0    0
a3       4           1    4000   3    150    4    130    2    150    3    400    0    0

Family是分组变量。如果"Component"(每个Family)的值与x1x2x3,{{1}中的值不匹配,我希望如此},y1y2,该变量的值和下一个(y3x1m_x1x2 ,. ..)被丢弃。我正在寻找的结果将是:

m_x2

我应该使用什么功能?我尝试过合并但无法使其发挥作用。

3 个答案:

答案 0 :(得分:2)

这是一个简单的方法:

# find nonmatching entries
idx <- dat[-(1:2)][c(TRUE, FALSE)] != dat$Component

# full index
idx_full <- idx[ , rep(seq(ncol(idx)), each = 2)]

# replace values with 0
dat[-(1:2)][idx_full] <- 0

dat
#   Family Component x1 m_x1 x2 m_x2 x3 m_x3 y1 m_y1 y2 m_y2 y3 m_y3
# 1     a1         1  1  100  0    0  0    0  0    0  0    0  0    0
# 2     a1         2  0    0  2  300  0    0  2  250  0    0  0    0
# 3     a1         3  0    0  0    0  0    0  0    0  0    0  0    0
# 4     a2         1  0    0  0    0  0    0  0    0  0    0  0    0
# 5     a2         2  2  150  0    0  0    0  0    0  0    0  0    0
# 6     a3         1  1 4000  0    0  0    0  0    0  0    0  0    0
# 7     a3         2  0    0  0    0  0    0  2  150  0    0  0    0
# 8     a3         3  0    0  3  150  0    0  0    0  3  400  0    0
# 9     a3         4  0    0  0    0  4  130  0    0  0    0  0    0

其中dat是数据框的名称。

答案 1 :(得分:1)

您可以尝试:

cols <- as.vector(t(outer(c("x","y"), 1:3, 
                     function(...) paste(...,sep=""))))
df[, 3:ncol(df)] <- do.call(cbind, lapply(cols, function(x) df[, 
                              c(x,paste(sep="","m_",x))]*(df[[x]]==df$Component)))

答案 2 :(得分:1)

如果列总是不在同一个顺序中,您也可以这样做:

 n1 <- unique(gsub(".+\\_", "", colnames(df1)[-(1:2)]))

 df1[,-(1:2)] <- do.call(cbind,lapply(n1, function(x) {
                      indx <- grep(x, names(df1))
                      m1 <- as.matrix(df1[indx])
                      m1[m1[,1]!=df1$Component] <- 0
                      as.data.frame(m1) }))
  df1
  #   Family Component x1 m_x1 x2 m_x2 x3 m_x3 y1 m_y1 y2 m_y2 y3 m_y3
  #1     a1         1  1  100  0    0  0    0  0    0  0    0  0    0
  #2     a1         2  0    0  2  300  0    0  2  250  0    0  0    0
  #3     a1         3  0    0  0    0  0    0  0    0  0    0  0    0
  #4     a2         1  0    0  0    0  0    0  0    0  0    0  0    0
  #5     a2         2  2  150  0    0  0    0  0    0  0    0  0    0
  #6     a3         1  1 4000  0    0  0    0  0    0  0    0  0    0
  #7     a3         2  0    0  0    0  0    0  2  150  0    0  0    0
  #8     a3         3  0    0  3  150  0    0  0    0  3  400  0    0
  #9     a3         4  0    0  0    0  4  130  0    0  0    0  0    0