如何比较和制表r中两个向量之间共同元素的频率

时间:2016-02-07 04:33:41

标签: r compare frequency elements

我有两个具有共同和重复元素的向量。我想要一个比较两个向量中常见元素频率的表格。这是子集

plyr::count(V1)
          x freq
1  A*02:01  106

2  A*02:02   88

3  A*03:01   95

4  A*03:02   60

plyr::count(V2)

   x freq

1  A*02:01   11

2  A*02:02   11

3  A*02:04    1

4  A*03:01   20

我想要的输出是:

   x  freq.V1  freq.V2

1  A*02:01    106     11 


2  A*02:02     88     11


3  A*03:01     60     20

2 个答案:

答案 0 :(得分:1)

我认为merge似乎是一个不错的选择,因为默认情况是保持两个数据集的共同观察。所以以下内容应该有效

merge(plyr::count(V1), plyr::count(V2), by="x")

工作示例

plyr::count(mtcars$gear)
#   x freq
# 1 3   15
# 2 4   12
# 3 5    5
plyr::count(mtcars$gear[1:10])
#   x freq
# 1 3    4
# 2 4    6

merge(
plyr::count(mtcars$gear),
plyr::count(mtcars$gear[1:10]), 
by="x")
#   x freq.x freq.y
# 1 3     15      4
# 2 4     12      6

答案 1 :(得分:0)

只需使用table

tbl1 <- table(V1[V1 %in% (int <- intersect(unique(V1), unique(V2)))])
tbl2 <- table(V2[V2 %in% int])

data.frame(x = names(tbl1), freq.V1 = as.vector(tbl1), freq.V2 = as.vector(tbl2))

或者我的最爱,data.table

library(data.table)
DT <- data.table(V1 = V1, V2 = V2)

DT[V1 %in% unique(V2), .(freq.V1 = .N), by = .(x = V1)
   ][DT[V2 %in% unique(V1), .N, by = .(x = V2)],
     freq.V2 := i.N, on = "x", nomatch = 0L]

当然,如果您事先知道V1V2由同一组元素组成,那么这两个选项看起来会更简单:

data.frame(x = names(tbl1 <- table(V1)), freq.V1 = as.vector(tbl1),
           freq.V2 = as.vector(table(V2)))

DT[ , .(freq.V1 = .N), by = .(x = V1)
   ][DT[ , .(freq.V2 = .N), by = .(x = V2)], on = "x"]