我有一个名为mydf
的数据框。 Sample
列中有重复的样本。我想提取最大total_reads
的唯一样本行并获取result
。
mydf<-structure(list(Sample = c("AOGC-02-0188", "AOGC-02-0191", "AOGC-02-0191",
"AOGC-02-0191", "AOGC-02-0194", "AOGC-02-0194", "AOGC-02-0194"
), total_reads = c(27392583, 19206920, 34462563, 53669483, 24731988,
43419826, 68151814), Lane = c("4", "5", "4", "4;5", "5", "4",
"4;5")), .Names = c("Sample", "total_reads", "Lane"), row.names = c("166",
"169", "170", "171", "173", "174", "175"), class = "data.frame")
结果
Sample total_reads Lane
AOGC-02-0188 27392583 4
AOGC-02-0191 53669483 4;5
AOGC-02-0194 68151814 4;5
答案 0 :(得分:4)
您可以aggregate
然后merge
,
merge(aggregate(total_reads ~ Sample, mydf, max), mydf)
# Sample total_reads Lane
#1 AOGC-02-0188 27392583 4
#2 AOGC-02-0191 53669483 4;5
#3 AOGC-02-0194 68151814 4;5
答案 1 :(得分:2)
我们可以使用data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(mydf)
),按&#34;示例&#34;,order
&#39; total_reads&#39;使用head
对第一次观察进行依次推断和分组。
library(data.table)
setDT(mydf)[order(-total_reads), head(.SD, 1) , Sample]
答案 2 :(得分:1)
使用mydf %>%
group_by(Sample) %>% # for each unique sample
arrange(-total_reads) %>% # order by total_reads DESC
slice(1) # select the first row, i.e. with highest total_reads
包,你可以这样做:
{{1}}