我有一个字符串,基本上是一个SQL语句。我想提取其中的一部分。 这是代码
SELECT
DTE as "Date",
CURRENT_DATE AS "Day",
concat( BCCO, BCBCH ) AS "client/batch",
BCSTAT as "Batch Status",
CASE
WHEN EXC = 'MCR' THEN CNT
ELSE 0
END AS "MCR-NPR",
CASE
WHEN EXC = 'NRC' THEN CNT
ELSE 0
END AS "NRC-NPR",
CASE
WHEN EXC = 'OFD' THEN CNT
ELSE 0
END AS "OFD-NPR",
CASE
WHEN EXC = 'TDB' THEN CNT
ELSE 0
END AS "TDB-NPR",
CASE
WHEN EXC = 'TDC' THEN CNT
ELSE 0
END AS "TDC-NPR",
CASE
WHEN EXC = 'UDC' THEN CNT
ELSE 0
END AS "UDC-NPR",
CASE
WHEN EXC = 'BIN' THEN CNT
ELSE 0
END AS "BIN-WRN",
CASE
WHEN EXC = 'DSP' THEN CNT
ELSE 0
END AS "DSP-WRN",
我想提取END AS和引号之间的每个元素。像(“ MCR-NPR”,...,“ DSP-WRN”)这样的向量将是期望的输出。
我知道我可能需要使用正则表达式,但是我无法提取其中的每一个。
任何想法都会受到赞赏。
最好
答案 0 :(得分:2)
1)grep / read.table grep
用END AS
排成一行,并用read.table
加上双引号sep
来读取。第二列将是所需的数据。不使用正则表达式或包。
read.table(text = grep("END AS", s, value = TRUE, fixed = TRUE),
sep = '"', as.is = TRUE)[[2]]
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
1a):这类似于(1),但使用带有正则表达式的sub
而不是read.table
:
sub('.*END AS "(.+)".*', "\\1", grep("END AS", s, value = TRUE))
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
2)绑紧。另一种方法如下。它利用了所需的字符串在END AS之后并用双引号引起来的事实,它具有此处显示的最短代码。
library(gsubfn)
unlist(strapplyc(s, 'END AS "(.+)"'))
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
3)捕获另一种使用与(2)中相同的模式的基本R方法是:
na.omit(strcapture('END AS "(.+)"', s, list(value = character(0))))
给予:
value
9 MCR-NPR
13 NRC-NPR
17 OFD-NPR
21 TDB-NPR
25 TDC-NPR
29 UDC-NPR
33 BIN-WRN
37 DSP-WRN
输入s
以可复制的形式:
s <-
c("SELECT ", " DTE as \"Date\",", " CURRENT_DATE AS \"Day\",",
" concat( BCCO, BCBCH ) AS \"client/batch\",", " BCSTAT as \"Batch Status\",",
" CASE ", " WHEN EXC = 'MCR' THEN CNT ", " ELSE 0 ", " END AS \"MCR-NPR\",",
" CASE ", " WHEN EXC = 'NRC' THEN CNT ", " ELSE 0 ", " END AS \"NRC-NPR\",",
" CASE ", " WHEN EXC = 'OFD' THEN CNT ", " ELSE 0 ", " END AS \"OFD-NPR\",",
" CASE ", " WHEN EXC = 'TDB' THEN CNT ", " ELSE 0 ", " END AS \"TDB-NPR\",",
" CASE ", " WHEN EXC = 'TDC' THEN CNT ", " ELSE 0 ", " END AS \"TDC-NPR\",",
" CASE ", " WHEN EXC = 'UDC' THEN CNT ", " ELSE 0 ", " END AS \"UDC-NPR\",",
" CASE ", " WHEN EXC = 'BIN' THEN CNT ", " ELSE 0 ", " END AS \"BIN-WRN\",",
" CASE ", " WHEN EXC = 'DSP' THEN CNT ", " ELSE 0 ", " END AS \"DSP-WRN\"")