如何使用正则表达式提取数据框中的文本行和列

时间:2017-09-29 06:31:00

标签: r

我想提取包含'IDP'和&的文本的行索引和列索引。 r。

中整个数据框中的'SB'

文本模式如下所示

  Col1              Col2
    IDP ENGINE(SB)    IDP ENGINE(PS)
    IDP ENGINE SB01   MAIN ENGINE(SB)
    IDP ENGINE SDV    AUX. ENGINE(SB)

我的输出将是

row column
 1    1
 2    1 

3 个答案:

答案 0 :(得分:1)

一种选择是遍历列,使用which,然后使用arr.ind = TRUEwhich(sapply(df1, function(x) grepl("(\\bIDP\\b.*\\bSB)|(\\bSB\\bIDP)", x)), arr.ind = TRUE) # row col #[1,] 1 1 #[2,] 2 1 获取索引

(defun carnaval (year)
  "Carnaval Monday of YEAR.

This is 48 days before Easter Sunday."
  (- (easter year) 48))

答案 1 :(得分:1)

d = data.frame(
    col1 = c("IDP ENGINE(SB)", "IDP ENGINE SB01", "IDP ENGINE SDV")
    , col2 = c("IDP ENGINE(PS)", "MAIN ENGINE(SB)", "AUX. ENGINE(SB)")
);


d


which(
    apply(d, c(1,2), grepl, pattern="IDP") & apply(d, c(1,2), grepl, pattern="SB")
    , arr.ind = TRUE
)

答案 2 :(得分:1)

我会建议这样的事情:

which(matrix(grepl(pattern = '(?=.*IDP)(?=.*SB)', as.matrix(df1), perl = TRUE), ncol = NCOL(df1)), arr.ind = TRUE)
     row col
[1,]   1   1
[2,]   2   1