Question

考虑任何数据框

            col1   col2    col3   col4
row.name11    A     23      x       y
row.name12    A     29      x       y
row.name13    B     17      x       y
row.name14    A     77      x       y

我有一个我希望从此数据框返回的rownames列表。假设我在列表中有row.name12和row.name13。我可以轻松地从数据帧返回这些行。但我也希望在这些行上方返回4行和4行。这意味着我想从row.name8返回到row.name17。我认为它类似于shell中的grep -A -B。

可能的解决方案 - 有没有办法按行名返回行号？因为如果我有行号，我可以轻松地减去4并在行号中添加4并返回行。

注意：这里的rownames只是示例。 Rownames可以是RED，BLUE，BLACK等。

Answer 1

试试：

extract.with.context <- function(x, rows, after = 0, before = 0) {

  match.idx  <- which(rownames(x) %in% rows)
  span       <- seq(from = -before, to = after)
  extend.idx <- c(outer(match.idx, span, `+`))
  extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
  extend.idx <- sort(unique(extend.idx))

  return(x[extend.idx, , drop = FALSE])
}

dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
#    x
# a  1
# b  2
# c  3
# d  4
# e  5
# i  9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26

Answer 2

也许which()和%in%的组合可以帮助您：

dat[which(rownames(dat) %in% c("row.name13")) + c(-1, 1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name14    A   77    x    y

在上面，我们试图确定“dat”中的哪些行名称是“row.name13”（使用which()），而+ c(-1, 1)告诉R返回之前的行和排在后面。如果您想要包含该行，则可以执行+ c(-1:1)。

之类的操作

要获取行范围，请将逗号切换为冒号：

dat[which(rownames(dat) %in% c("row.name13")) + c(-1:1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y

更新

匹配列表有点棘手，但没有考虑太多，这是一种可能性：

myRows <- c("row.name12", "row.name13")
rowRanges <- lapply(which(rownames(dat) %in% myRows), function(x) x + c(-1:1))
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 2 3 4
#
lapply(rowRanges, function(x) dat[x, ])
# [[1]]
#            col1 col2 col3 col4
# row.name11    A   23    x    y
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# 
# [[2]]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y

这会输出list个data.frame，这可能很方便，因为您可能有重复的行（如本示例所示）。

更新2：如果更合适，请使用`grep`

以下是您的问题的一个变体，使用which() ... %in%方法解决问题的方法不太方便。

set.seed(1)
dat1 <- data.frame(ID = 1:25, V1 = sample(100, 25, replace = TRUE))
rownames(dat1) <- paste("rowname", sample(apply(combn(LETTERS[1:4], 2), 
                                               2, paste, collapse = ""), 
                                         25, replace = TRUE), 
                       sprintf("%02d", 1:25), sep = ".")
head(dat1)
#               ID V1
# rowname.AD.01  1 27
# rowname.AB.02  2 38
# rowname.AD.03  3 58
# rowname.CD.04  4 91
# rowname.AD.05  5 21
# rowname.AD.06  6 90

现在，假设您想要使用AB和AC标识行，但您没有数字后缀列表。

这是一个可以在这种情况下使用的小功能。它从@Spacedman借了一点，以确保返回的行在数据范围内（根据@ flodel的建议）。

getMyRows <- function(data, matches, range) {
  rowMatches = lapply(unlist(lapply(matches, function(x)
    grep(x, rownames(data)))), function(y) y + range)
  rowMatches = lapply(rowMatches, function(x) x[x > 0 & x <= nrow(data)])
  lapply(rowMatches, function(x) data[x, ])
}

您可以按如下方式使用它（但我不会在此处打印结果）。首先，指定数据集，然后指定要匹配的模式，然后指定范围（在此示例中，前三行，后四行）。

getMyRows(dat1, c("AB", "AC"), -3:4)

将其应用于匹配row.name12和row.name13的早期示例，您可以按如下方式使用它：getMyRows(dat, c(12, 13), -1:1)。

您还可以修改该函数以使其更通用（例如，指定与列匹配而不是与行名称匹配）。

Answer 3

创建一些示例数据：

> dat=data.frame(col1=letters,col2=sample(26),col3=sample(letters))
> dat
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
...

设置我们的目标向量（注意我选择边缘情况和重叠的情况），并找到匹配的行：

> target=c("a","e","g","s")
> match = which(dat$col1 %in% target)

创建匹配的-2到+2的序列（根据您的需要进行调整）并合并：

> getThese = unique(as.vector(mapply(seq,match-2,match+2)))
> getThese
 [1] -1  0  1  2  3  4  5  6  7  8  9 17 18 19 20 21

修复边缘情况：

> getThese = getThese[getThese > 0 & getThese <= nrow(dat)]
> dat[getThese,]
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
4     d   22    d
5     e    2    j
6     f    9    l
7     g    1    w
8     h   21    n
9     i   17    p
17    q   18    a
18    r   10    m
19    s   24    o
20    t   13    e
21    u    3    k
>

记住我们的目标是a，e，g和s。你现在已经获得了上面两行以及下面两行，没有重复。

如果您使用的是行名，只需从中创建“匹配”即可。我正在使用一个专栏。

如果这是我的问题，我会使用testthat包编写更多测试。

Answer 4

我只想按照以下步骤进行：

dat[(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4),]

grep("row.name12",row.names(dat))为您提供了"row.name12"作为名称的行号，因此

(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4)

为您提供一系列行号，范围从名为"row.name12"的行前面的第4行到名为"row.name13"的行后的第4行。

返回r数据帧中特定行的上下行

4 个答案:

更新

更新2：如果更合适，请使用`grep`

返回r数据帧中特定行的上下行

4 个答案:

更新

更新2：如果更合适，请使用grep

更新2：如果更合适，请使用`grep`