Question

我正在寻找两个变量的唯一组合：

library(purrr)
cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`)
# A tibble: 6 x 2
    id1   id2
  <int> <int>
1     2     1
2     3     1
3     1     2
4     3     2
5     1     3
6     2     3

如何删除镜像的组合？也就是说，我只希望上面的数据框中的第1行和第3行之一，仅第2行和第5行之一，以及第4行和第6行之一。我想要的输出如下所示：

# A tibble: 3 x 2
    id1   id2
  <int> <int>
1     2     1
2     3     1
3     3     2

我不在乎某个特定的id值是在id1还是id2中，因此以下内容与输出内容一样：

# A tibble: 3 x 2
    id1   id2
  <int> <int>
1     1     2
2     1     3
3     2     3

Answer 1

基本R方法：

# create a string with the sorted elements of the row
df$temp <- apply(df, 1, function(x) paste(sort(x), collapse=""))

# then you can simply keep rows with a unique sorted-string value
df[!duplicated(df$temp), 1:2]

Answer 2

丹的答案的整版版本：

cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`) %>% 
  mutate(min = pmap_int(., min), max = pmap_int(., max)) %>% # Find the min and max in each row
  unite(check, c(min, max), remove = FALSE) %>% # Combine them in a "check" variable
  distinct(check, .keep_all = TRUE) %>% # Remove duplicates of the "check" variable
  select(id1, id2)

# A tibble: 3 x 2
    id1   id2
  <int> <int>
1     2     1
2     3     1
3     3     2

删除数据框中变量的镜像组合

2 个答案: