我有以下data.frame
,其中多个X列(1,2,3 ... N)为空白:
df1 <- data.frame( name = c("A","B","C"),
X1 = c("","", ""),
Y1 = c("aa","bb","cc"),
Z1 = c("AA","BB","CC"),
X2 = c("","", ""),
Y2 = c("dd","",""),
Z2 = c("AA","",""),
X3 = c("","", ""),
Y3 = c("","","ee"),
Z3 = c("","","CC"))
另一个data.frame
包含应根据Ys和Zz列中观察到的值组合分配给X列的值:
df2 <- data.frame( Y = c("aa","bb","cc","dd","ee"),
Z = c("AA","BB","CC","AA","CC"),
X = c (1,2,3,4,5))
我如何根据df2上的信息在df1中指定X的值,所以我可以得到df3?:
df3 <- data.frame( name = c("A","B","C"),
X1 = c("1","2", "3"),
Y1 = c("aa","bb","cc"),
Z1 = c("AA","BB","CC"),
X2 = c("4","", ""),
Y2 = c("dd","",""),
Z2 = c("AA","",""),
X3 = c("","", "5"),
Y3 = c("","","ee"),
Z3 = c("","","CC"))`
请注意,在我的真实数据库中,每个名称可能包含,但不一定包含多个列(例如,X1,Y1,Z1... X10,Y10,Z10
)。
答案 0 :(得分:2)
此策略将您的数据从宽格式重新整形为长格式,进行合并,然后重新整形所有内容。
# go from wide to long
x1 <- reshape(df1,
varying=Map(function(x) paste0(x, 1:3), c("X","Y","Z")),
v.names=c("X","Y","Z"),
idvar="name",
timevar="time",
direction="long")
x2 <- merge(subset(x1, select=-X), df2, by=c("Y","Z"), all.x=T)
# replace NA values with blanks
x2[is.na(x2$X),"X"] <- ""
# go back to wide
x3 <- reshape(x2,idvar="name",direction="wide", sep="")
然后和x3
name Y1 Z1 X1 Y2 Z2 X2 Y3 Z3 X3
1 A aa AA 1 dd AA 4
2 B bb BB 2
3 C cc CC 3 ee CC 5
在这里,您可以按照略有不同的顺序获取列,但如果需要,您可以在事后轻松修复。
你可以看到有一个我硬编码的地方1:3
。如果您有更多的列重复,则可以调整该向量。