Question

我有一个相互面对的410个DNA序列，以获得相似性。

现在，要修剪数据库，我应该摆脱2列中具有相同值的行，因为当然每个值都将是两倍。

为了使自己清楚，我有类似的东西

tribble(
  ~seq01, ~seq02, ~ similarity,
  "a",   "b", 100.000,
  "b",   "a", 100.000,
  "c",   "d", 99.000,
  "d",   "c", 99.000,
)

比较a-b和b-a是同一回事，所以我想摆脱double值

我想结束的是

tribble(
  ~seq01, ~seq02, ~ similarity,
  "a",   "b", 100.000,
  "c",   "d", 99.000
)

我不确定该如何进行，我想到的所有方式都有些古怪。我检查了其他答案，但并不真正令我满意。

任何输入将不胜感激（但整洁的输入将更受赞赏！）

Answer 1

我们可以使用const carnivores = ['lion', 'shark', 'wolve', 'puma', 'snake']; const herbivores = ['elephant', 'giraffe', 'gacelle', 'hippo', 'koala']; const omnivores = ['human', 'monkey', 'dog', 'bear', 'pig']; const animals = [carnivores, herbivores, omnivores]; const longestAnimals = (arr) => arr.map((x) => x.sort((a, b) => b.length - a.length)[0]); console.log(longestAnimals(animals));和pmin对值进行排序，然后使用pmax选择唯一的行。

distinct

Answer 2

另一种base R，方法：

df$add1 <- apply(df[,1:2], 1, min)  # find rowwise minimum values 
df$add2 <- apply(df[,1:2], 1, max)  # find rowwise maximum values 
df <- df[!duplicated(df[,4:5]),]    # remove rows with identical values in new col's
df[,4:5] <- NULL                    # remove auxiliary col's

结果：

df
# A tibble: 2 x 3
  seq01 seq02 similarity
  <chr> <chr>      <dbl>
1 a     b            100
2 c     d             99

如果值在不同的列中具有相同的组合，则删除行

2 个答案: