Question

我有3个向量

var req = http.request(options, function (res) {
  var chunks = [];

  res.on("data", function (chunk) {
    chunks.push(chunk);
  });

  res.on("end", function () {
    var body = Buffer.concat(chunks);
    // TODO: send data to client
    // res.status(200).json(JSON.stringify(body.toString()))
    console.log(body.toString());
  });
});

req.end();

我想找到x <- c(1,3,5,7,3,8) y <- c(3,5,7) z <- c(3,3,8)和x中不在的y元素。是否有一个函数z会给我以下输出：

换句话说，我想找到两个向量之间的“集合差异”，其中两个向量可能都有重复的值。在明显的原因下，函数> f(x,y) 1 3 8 > f(x,z) 1 5 7，%in%和match在这种情况下不起作用。

Answer 1

应该有一些更好的方法来做到这一点，但这是一个选择

get_diff_vectors <- function(x, y) {
  count_x <- table(x)
  count_y <- table(y)
  same_counts <- match(names(count_y), names(count_x))
  count_x[same_counts] <- count_x[same_counts] - count_y
  as.numeric(rep(names(count_x), count_x))
}

get_diff_vectors(x, y)
#[1] 1 3 8
get_diff_vectors(x, z)
#[1] 1 5 7
get_diff_vectors(x, c(5, 7))
#[1] 1 3 3 8

我们使用x对y和table的频率进行计数，match两者中出现的数字，然后从{{1 }}。最后，使用y重新创建剩余的向量。

仍然找不到更好的方法，但是这里的x使用的逻辑有些相似。

rep

OP提到的dplyr软件包具有功能library(dplyr) get_diff_vectors_dplyr <- function(x, y) { df1 <- data.frame(x) %>% count(x) df2 <- data.frame(y) %>% count(y) final <- left_join(df1, df2, by = c("x" = "y")) %>% mutate_at(c("n.x", "n.y"), funs(replace(., is.na(.), 0))) %>% mutate(n = n.x - n.y) rep(final$x, final$n) } get_diff_vectors_dplyr(x, y) #[1] 1 3 8 get_diff_vectors_dplyr(x, z) #[1] 1 5 7 get_diff_vectors_dplyr(x, c(5, 7)) #[1] 1 3 3 8，该功能非常容易实现

vecsets

Answer 2

这里尝试使用make.unique来解决重复项：

dupdiff <- function(x,y) x[-match(
  make.unique(as.character(y)),
  make.unique(as.character(x)),
  nomatch=0
)]

测试：

dupdiff(x,y)
#[1] 1 3 8
dupdiff(x,z)
#[1] 1 5 7
dupdiff(x, c(5, 7))
#[1] 1 3 3 8
dupdiff(x, c(5, 7, 9))
#[1] 1 3 3 8

Answer 3

match，但有一个for循环确实起作用：

> f(x, y)
[1] 1 3 8
> f(x, z)
[1] 1 5 7

代码

f <- function(s, r) {
    for(i in 1:length(s)){
        j <- match(s[i], r)
        if(!is.na(j)) {
            s[i] <- NA
            r[j] <- NA
        } 
    }
    print(s[complete.cases(s)])
}

Answer 4

有新的 Hadley-verse 包 waldo

对对象的差异进行真正漂亮而简洁的概述，而不仅仅是向量

library(waldo)

compare(x, y)
#> `old`: 1 3 5 7 3 8
#> `new`:   3 5 7
compare(x, z)
#> `old`: 1 3 5 7 3 8
#> `new`:   3     3 8

具有重复值的两个向量之间的“设置差异”

4 个答案:

代码