对于每个样本(D1,D2和D3),我想在每个“ s__
”属中找到表达最高的物种“ g__
”。例如,在不动杆菌属中,共有三种,我想找出哪一种在D1,D2和D3中具有最高的值。
知道如何解决这个问题吗?
在
D1 D2 D3
g__Acinetobacter|s__Acinetobacter_pittii 12 21 424
g__Acinetobacter|s__Acinetobacter_oleivorans 4 4 23
g__Acinetobacter|s__Acinetobacter_larvae 1 53 232
g__Pseudomonas|s__Pseudomonas_aeruginosa 13 13 323
g__Pseudomonas|s__Pseudomonas_citronellolis 23 23 11
出
Genus D1 D2 D3
g__Acinetobacter s__Acinetobacter_pittii s__Acinetobacter_larvae s__Acinetobacter_pittii
g__Pseudomonas s__Pseudomonas_citronellolis s__Pseudomonas_citronellolis s__Pseudomonas_aeruginosa
答案 0 :(得分:1)
使用dplyr
和tidyr
的一种方法,假设您的第一列称为V1
。我们通过separate
拆分来"|"
将该列一分为二,然后使用summarise_at
总结以"D"
开头的列,并选择对应的Species
以达到最大值列。
library(dplyr)
library(tidyr)
df %>%
separate(V1, into = c("Genus", "Species"), sep = "\\|") %>%
group_by(Genus) %>%
summarise_at(vars(starts_with("D")), ~Species[which.max(.)])
# A tibble: 2 x 4
# Genus D1 D2 D3
# <chr> <chr> <chr> <chr>
#1 g__Acinetobacter s__Acinetobacter_pittii s__Acinetobacter_larvae s__Acinetobacter_pittii
#2 g__Pseudomonas s__Pseudomonas_citronellolis s__Pseudomonas_citronellolis s__Pseudomonas_aeruginosa
数据
df <- structure(list(V1 = structure(c(3L, 2L, 1L, 4L, 5L), .Label =
c("g__Acinetobacter|s__Acinetobacter_larvae",
"g__Acinetobacter|s__Acinetobacter_oleivorans",
"g__Acinetobacter|s__Acinetobacter_pittii",
"g__Pseudomonas|s__Pseudomonas_aeruginosa",
"g__Pseudomonas|s__Pseudomonas_citronellolis"
), class = "factor"), D1 = c(12L, 4L, 1L, 13L, 23L), D2 = c(21L,
4L, 53L, 13L, 23L), D3 = c(424L, 23L, 232L, 323L, 11L)), class =
"data.frame", row.names = c(NA, -5L))