从下面的数据框中,我想为V5
中的每个唯一标识符选择两个第一行。我不知道从哪里开始。
> head(Up,1000)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 ENSG00000124357.8 NAGK ENST00000418807.3 9606 hsa-miR-106a-5p 3 114 121 -0.726 99 -0.726 99
2 ENSG00000131351.10 HAUS8 ENST00000253669.5 9606 hsa-miR-106a-5p 3 27 34 -0.714 99 -0.714 99
3 ENSG00000108702.3 CCL1 ENST00000225842.3 9606 hsa-miR-106a-5p 3 139 146 -0.670 99 -0.670 99
4 ENSG00000189159.11 HN1 ENST00000476258.1 9606 hsa-miR-123a-5p 3 107 114 -0.667 99 -0.666 99
5 ENSG00000154640.10 BTG3 ENST00000339775.6 9606 hsa-miR-123a-5p 3 167 174 -0.665 99 -0.665 99
6 ENSG00000087494.11 PTHLH ENST00000395872.1 9606 hsa-miR-123a-5p 3 291 298 -0.629 99 -0.629 99
7 ENSG00000197885.6 NKIRAS1 ENST00000388759.3 9606 hsa-miR-155a-5p 3 141 148 -0.628 99 -0.628 99
8 ENSG00000146826.10 C7orf43 ENST00000394035.2 9606 hsa-miR-155a-5p 3 491 498 -0.614 99 -0.613 99
9 ENSG00000117616.13 C1orf63 ENST00000243189.7 9606 hsa-miR-155a-5p 3 37 44 -0.585 99 -0.585 99
10 ENSG00000144583.4 MARCH4 ENST00000273067.4 9606 hsa-miR-155a-5p -2 1353 1359 -0.575 99 -0.575 99
11 ENSG00000213928.4 IRF9 ENST00000396864.3 9606 hsa-miR-1323-5p 3 305 312 -0.567 99 -0.567 99
12 ENSG00000072849.6 DERL2 ENST00000572834.1 9606 hsa-miR-1323-5p 3 253 260 -0.566 99 -0.566 99
13 ENSG00000155366.12 RHOC ENST00000339083.7 9606 hsa-miR-1323-5p 3 268 275 -0.554 99 -0.552 99
14 ENSG00000179431.5 FJX1 ENST00000317811.4 9606 hsa-miR-1323-5p 3 771 778 -0.550 99 -0.550 99
15 ENSG00000067057.12 PFKP ENST00000381125.4 9606 hsa-miR-1323-5p 3 73 80 -0.547 99 -0.547 99
16 ENSG00000204923.3 FBXO48 ENST00000377957.3 9606 hsa-miR-1323-5p 3 159 166 -0.531 99 -0.531 99
17 ENSG00000120539.10 MASTL ENST00000342386.6 9606 hsa-miR-1323-5p 3 246 253 -0.529 99 -0.529 99
数据
Up <- read.table(header = TRUE, stringsAsFactors = FALSE, text="V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 ENSG00000124357.8 NAGK ENST00000418807.3 9606 hsa-miR-106a-5p 3 114 121 -0.726 99 -0.726 99
2 ENSG00000131351.10 HAUS8 ENST00000253669.5 9606 hsa-miR-106a-5p 3 27 34 -0.714 99 -0.714 99
3 ENSG00000108702.3 CCL1 ENST00000225842.3 9606 hsa-miR-106a-5p 3 139 146 -0.670 99 -0.670 99
4 ENSG00000189159.11 HN1 ENST00000476258.1 9606 hsa-miR-123a-5p 3 107 114 -0.667 99 -0.666 99
5 ENSG00000154640.10 BTG3 ENST00000339775.6 9606 hsa-miR-123a-5p 3 167 174 -0.665 99 -0.665 99
6 ENSG00000087494.11 PTHLH ENST00000395872.1 9606 hsa-miR-123a-5p 3 291 298 -0.629 99 -0.629 99
7 ENSG00000197885.6 NKIRAS1 ENST00000388759.3 9606 hsa-miR-155a-5p 3 141 148 -0.628 99 -0.628 99
8 ENSG00000146826.10 C7orf43 ENST00000394035.2 9606 hsa-miR-155a-5p 3 491 498 -0.614 99 -0.613 99
9 ENSG00000117616.13 C1orf63 ENST00000243189.7 9606 hsa-miR-155a-5p 3 37 44 -0.585 99 -0.585 99
10 ENSG00000144583.4 MARCH4 ENST00000273067.4 9606 hsa-miR-155a-5p -2 1353 1359 -0.575 99 -0.575 99
11 ENSG00000213928.4 IRF9 ENST00000396864.3 9606 hsa-miR-1323-5p 3 305 312 -0.567 99 -0.567 99
12 ENSG00000072849.6 DERL2 ENST00000572834.1 9606 hsa-miR-1323-5p 3 253 260 -0.566 99 -0.566 99
13 ENSG00000155366.12 RHOC ENST00000339083.7 9606 hsa-miR-1323-5p 3 268 275 -0.554 99 -0.552 99
14 ENSG00000179431.5 FJX1 ENST00000317811.4 9606 hsa-miR-1323-5p 3 771 778 -0.550 99 -0.550 99
15 ENSG00000067057.12 PFKP ENST00000381125.4 9606 hsa-miR-1323-5p 3 73 80 -0.547 99 -0.547 99
16 ENSG00000204923.3 FBXO48 ENST00000377957.3 9606 hsa-miR-1323-5p 3 159 166 -0.531 99 -0.531 99
17 ENSG00000120539.10 MASTL ENST00000342386.6 9606 hsa-miR-1323-5p 3 246 253 -0.529 99 -0.529 99")
答案 0 :(得分:4)
如果我们需要根据分组变量“V5”获取前两行,则一个选项为data.table
。将“data.frame”转换为“data.table”(setDT(Up)
),按“V5”分组,使用head
获取前2行
library(data.table)
setDT(Up)[, head(.SD, 2) , by = V5]
在按“V5”分组后,使用slice
中的dplyr
。
library(dplyr)
Up %>%
group_by(V5) %>%
slice(1:2)
正如@Frank在评论中提到的(关于bug),当初始数据集为data.table
时,如果特定的“V5”具有少于2个元素,则输出显示另外的NA行。但是,如果我们使用data.frame
,它就会起作用。
适用于data.table
和data.frame
的选项将是(@Franks的评论)
Up %>%
group_by(V5) %>%
slice(head(seq_len(n()),2))
答案 1 :(得分:0)
使用基数R,以下内容可以使您的data.frame,Up:
非常接近# get the first unique row
UpFirstTwoRows <- which(!duplicated(Up[, "V5"]))
# get the adjacent row, dropping cases where only one unique row exists
UpFirstTwoRows <- sort(unique(c(UpFirstTwoRows, UpFirstTwoRows + 1)))
UpNew <- Up[UpFirstTwoRows,]
答案 2 :(得分:0)
逻辑:
ARGUMENTS
list:logical,如果为true则返回列表,否则返回data.frame
fun <- function(data, col, num.rows, list = TRUE){
d <- unique(data[[col]])
dev.data <- lapply(d, FUN = function(i) { data[data[[col]] == i, ][1:num.rows, ]})
if(!isTRUE(list)) {
return(do.call(rbind, dev.data))
} else {return(dev.data)}
}
实施例
fun(iris, "Species", 2, FALSE)
fun(iris, "Species", 3, TRUE)