我想知道哪些值在N列,N-1列,N-2列等中是常见的。
输入
structure(c("a", "b", "c", "d", "e", "f", "a", "z", "d", "b",
"e", "s", "a", "b", "c", "d", "e", "s", "a", "b", "c", "d", "e",
"f"), .Dim = c(6L, 4L), .Dimnames = list(NULL, c("x", "y", "z",
"a")))
输出:
common in all 4 columns :- a , b, e ,d common in maximum 3 columns :- c common in maximum 2 columns:- f,s
答案 0 :(得分:0)
从宽格式到长格式重塑给定矩阵(melt()
有一个矩阵方法)并按值计算可能是一种方法:
library(data.table)
options(datatable.print.class = TRUE)
setDT(melt(dat))[, .N, by = "value"][order(-N)]
value N <fctr> <int> 1: a 4 2: b 4 3: d 4 4: e 4 5: c 3 6: f 2 7: s 2 8: z 1
但是,需要增强代码以处理每列中的重复项(dat2
重复第1行):
setDT(melt(dat2))[, unique(value), by = Var2][, .N, by = "V1"][order(-N)]
V1 N <fctr> <int> 1: a 4 2: b 4 3: d 4 4: e 4 5: c 3 6: f 2 7: s 2 8: z 1
或更确切地说:
setDT(melt(dat2))[, unique(value), by = Var2][, .N, by = "V1"][
, toString(sort(V1)), by = N][order(-N)]
N V1 <int> <char> 1: 4 a, b, d, e 2: 3 c 3: 2 f, s 4: 1 z
N
表示值出现的列数。
dat <- structure(
c("a", "b", "c", "d", "e", "f", "a", "z", "d", "b", "e", "s",
"a", "b", "c", "d", "e", "s", "a", "b", "c", "d", "e", "f"),
.Dim = c(6L, 4L),
.Dimnames = list(NULL, c("x", "y", "z", "a")))
# second data set with duplicated row 1
dat2 <- dat[c(1, seq_len(nrow(dat))), ]
dat2
x y z a [1,] "a" "a" "a" "a" [2,] "a" "a" "a" "a" [3,] "b" "z" "b" "b" [4,] "c" "d" "c" "c" [5,] "d" "b" "d" "d" [6,] "e" "e" "e" "e" [7,] "f" "s" "s" "f"