我有一个这样的数据框:
Family Component x1 m_x1 x2 m_x2 x3 m_x3 y1 m_y1 y2 m_y2 y3 m_y3
a1 1 1 100 2 300 0 0 2 250 0 0 0 0
a1 2 1 100 2 300 0 0 2 250 0 0 0 01
a1 3 1 100 2 300 0 0 2 250 0 0 0 0
a2 1 2 150 0 0 0 0 0 0 0 0 0 0
a2 2 2 150 0 0 0 0 0 0 0 0 0 0
a3 1 1 4000 3 150 4 130 2 150 3 400 0 0
a3 2 1 4000 3 150 4 130 2 150 3 400 0 0
a3 3 1 4000 3 150 4 130 2 150 3 400 0 0
a3 4 1 4000 3 150 4 130 2 150 3 400 0 0
Family是分组变量。如果"Component"
(每个Family
)的值与x1
,x2
,x3
,{{1}中的值不匹配,我希望如此},y1
,y2
,该变量的值和下一个(y3
,x1
,m_x1
,x2
,. ..)被丢弃。我正在寻找的结果将是:
m_x2
我应该使用什么功能?我尝试过合并但无法使其发挥作用。
答案 0 :(得分:2)
这是一个简单的方法:
# find nonmatching entries
idx <- dat[-(1:2)][c(TRUE, FALSE)] != dat$Component
# full index
idx_full <- idx[ , rep(seq(ncol(idx)), each = 2)]
# replace values with 0
dat[-(1:2)][idx_full] <- 0
dat
# Family Component x1 m_x1 x2 m_x2 x3 m_x3 y1 m_y1 y2 m_y2 y3 m_y3
# 1 a1 1 1 100 0 0 0 0 0 0 0 0 0 0
# 2 a1 2 0 0 2 300 0 0 2 250 0 0 0 0
# 3 a1 3 0 0 0 0 0 0 0 0 0 0 0 0
# 4 a2 1 0 0 0 0 0 0 0 0 0 0 0 0
# 5 a2 2 2 150 0 0 0 0 0 0 0 0 0 0
# 6 a3 1 1 4000 0 0 0 0 0 0 0 0 0 0
# 7 a3 2 0 0 0 0 0 0 2 150 0 0 0 0
# 8 a3 3 0 0 3 150 0 0 0 0 3 400 0 0
# 9 a3 4 0 0 0 0 4 130 0 0 0 0 0 0
其中dat
是数据框的名称。
答案 1 :(得分:1)
您可以尝试:
cols <- as.vector(t(outer(c("x","y"), 1:3,
function(...) paste(...,sep=""))))
df[, 3:ncol(df)] <- do.call(cbind, lapply(cols, function(x) df[,
c(x,paste(sep="","m_",x))]*(df[[x]]==df$Component)))
答案 2 :(得分:1)
如果列总是不在同一个顺序中,您也可以这样做:
n1 <- unique(gsub(".+\\_", "", colnames(df1)[-(1:2)]))
df1[,-(1:2)] <- do.call(cbind,lapply(n1, function(x) {
indx <- grep(x, names(df1))
m1 <- as.matrix(df1[indx])
m1[m1[,1]!=df1$Component] <- 0
as.data.frame(m1) }))
df1
# Family Component x1 m_x1 x2 m_x2 x3 m_x3 y1 m_y1 y2 m_y2 y3 m_y3
#1 a1 1 1 100 0 0 0 0 0 0 0 0 0 0
#2 a1 2 0 0 2 300 0 0 2 250 0 0 0 0
#3 a1 3 0 0 0 0 0 0 0 0 0 0 0 0
#4 a2 1 0 0 0 0 0 0 0 0 0 0 0 0
#5 a2 2 2 150 0 0 0 0 0 0 0 0 0 0
#6 a3 1 1 4000 0 0 0 0 0 0 0 0 0 0
#7 a3 2 0 0 0 0 0 0 2 150 0 0 0 0
#8 a3 3 0 0 3 150 0 0 0 0 3 400 0 0
#9 a3 4 0 0 0 0 4 130 0 0 0 0 0 0