我有一个名为mydf
的数据框。有三组列表示为app
,ora
和pin
。我想匹配或比较所有列值与app vs ora,ora vs pin和pin vs app列,并获得一致性/匹配统计信息。我还希望获得三个变量之间的整体一致性,并制作表示数据的图表。 R中最好的方法是什么?
mydf<-structure(c("0/0", "0/1", "0/0", "0/0", "0/0", "0/0", "0/0",
"0/0", "0/1", "0/0", "0/1", "0/0", "0/0", "0/0", "0/0", "0/0",
"0/0", "0/1"), .Dim = c(3L, 6L), .Dimnames = list(c("1", "2",
"4"), c("app:x", "ora:x", "pin:x", "app:y", "ora:y", "pin:y")))
答案 0 :(得分:2)
嗯,这是作为入门者的一种方法(可能有很多优化空间,因为我不熟悉data.table包):
library(splitstackshape)
dt <- cSplit(melt(cSplit(mydf, 1:6, "/")[, rowname:=rownames(mydf)], id.vars = c("rowname")), 2, ":")[]
setkey(dt, rowname, variable_2)
dt <- dt[dt, allow.cartesian=TRUE][variable_1!=i.variable_1]
idx <- which(!duplicated(cbind(dt$rowname,dt$variable_2, t(apply(dt[, .(variable_1, i.variable_1)], 1, function(x) sort(x))))))
dt <- dt[idx, .(rowname, variable_2, variable_1, i.variable_1, isEqual=value==i.value)]
dt
# rowname variable_2 variable_1 i.variable_1 isEqual
# 1: 1 x_1 ora app TRUE
# 2: 1 x_1 pin app TRUE
# 3: 1 x_1 pin ora TRUE
# 4: 1 x_2 ora app TRUE
# 5: 1 x_2 pin app TRUE
# 6: 1 x_2 pin ora TRUE
# 7: 1 y_1 ora app TRUE
# 8: 1 y_1 pin app TRUE
# 9: 1 y_1 pin ora TRUE
# 10: 1 y_2 ora app TRUE
# 11: 1 y_2 pin app TRUE
# 12: 1 y_2 pin ora TRUE
# 13: 2 x_1 ora app TRUE
# 14: 2 x_1 pin app TRUE
# 15: 2 x_1 pin ora TRUE
# 16: 2 x_2 ora app FALSE
# 17: 2 x_2 pin app FALSE
# ...
library(ggplot2)
ggplot(dt, aes(variable_1, i.variable_1, fill=isEqual)) +
geom_tile() +
facet_grid(rowname~variable_2)