在R中两个字符串之间查找和替换文本

时间:2019-09-03 18:58:47

标签: r regex string-substitution

我在一些Rscripts中创建了一些有关R的教程。我需要一个没有答案的讲义集(HS)和编码集(CS),学生可以在其中进行编码。我需要一些正则表达式帮助来搜索HO中的答案部分,以便可以将其从CS中删除。

在HS中,我在答案之前/之后有开始(#'YOUR_ANSWER)和结束(#'END_ANSWER)标志。要创建HO集,我需要替换

YOUR_ANSWER
As_samp2 = 36
As_samp3 = 38      
#'END_ANSWER

"space for answer".  

所以,如果我的文字在:

a = "#'YOUR_ANSWER
       As_samp2 = 36
       As_samp3 = 38

       #'END_ANSWER"

我已经尝试过正则表达式,但是没有替代品

b <-gsub(pattern = "YOUR_ANSWER(.*\n*)*#'END_ANSWER", a, replace="space for answer" )

如果我不使用正则表达式,即找到“ YOUR_ANSWER”-替代作品,即

c <-gsub(pattern = "YOUR_ANSWER", a, replace="space for answer" )

如果我只是进行正则表达式,则按预期替换所有文本,即

d <- gsub(pattern = "(.*\n*)*", a, replace="space for answer" )

,但组合无效。 正则表达式应该可以正常工作:

https://regex101.com/r/USvzLF/1

所以一定有一些我不懂的R魔法

    b <- gsub(pattern = "YOUR_ANSWER(.*\n*)*END_ANSWER", a, replace="space for answer" )
    c <- gsub(pattern = "YOUR_ANSWER", a, replace="space for answer" )
    d <- gsub(pattern = "(.*\n*)*", a, replace="space for answer" )

我希望已用答案空间替换了YOUR_ANSWER和END_ANSWER之间的所有内容 但是什么也没发生。有任何想法吗? 现在更新@ r2evans向我展示了正则表达式; 我要更改的R脚本为https://pastebin.com/mnjpkUFk(即myfile) 我用来尝试更改它的代码(在单独的R脚本中)是: FileM <-readLines(myfile) FileMedit <-gsub(pattern =“ YOUR_ANSWER”,FileM,replace =“答案空间”) FileMedit <-gsub(pattern =“ YOUR_ANSWER。* END_ANSWER”,FileM,replace =“ space for answer”) writeLines(FileMedit,file =“ outputfileM.R”)

2 个答案:

答案 0 :(得分:0)

要获得更具体的匹配,您可以匹配第一行。然后匹配以下所有行(不要以可选的前导水平空白字符和#'END_ANSWER作为该行上的唯一文本)。

然后匹配最后一行,并将匹配项替换为space for answer

#'YOUR_ANSWER.*(?:\R(?!\h*#'END_ANSWER$).*)*\R\h*#'END_ANSWER$

Regex demo | R demo

例如

b <-gsub(pattern = "^#'YOUR_ANSWER.*(?:\\R(?!\\h*#'END_ANSWER$).*)*\\R\\h*#'END_ANSWER$", a, replace="space for answer", per=T)

如果您要替换YOUR_ANSWER和END_ANSWER之间的内容,则可以使用2个插入组,并在替换中使用它们。

^(#'YOUR_ANSWER.*)(?:\R(?!\h*#'END_ANSWER$).*)*(\R\h*#'END_ANSWER)$

Regex demo | R demo

答案 1 :(得分:0)

问题是您以字符向量列表的形式读取文件,并应用了正则表达式,期望以单个多行文本作为输入。

> FileM
 [1] "#'Rstudio environment"                                                             "#'==="                                                                            
 [3] " "                                                                                 "#'Top Left - scripts"                                                             
 [5] "#+"                                                                                "myfirstvariable = \"Hello R\"  #press control enter with cursor on line  "        
 [7] "myfirstvariable"                                                                   "As_samp1 = 34"                                                                    
 [9] " "                                                                                 "#'practical: create variables for arsenic concentration in 2 more samples"        
[11] "#+"                                                                                "#'YOUR_ANSWER"                                                                    
[13] "As_samp2 = 36"                                                                     "As_samp3 = 38"                                                                    
[15] " "                                                                                 "#'END_ANSWER"                                                                     
[17] "#+"                                                                                "#'Bottom Left - console"                                                          
[19] "#+"                                                                                "2+2"                                                                              
[21] " "                                                                                 "#'practical: calculate average As concentration, store result in variable As_mean"
[23] "#+"                                                                                "#'YOUR_ANSWER"                                                                    
[25] "As_mean<- (As_samp1 + As_samp2 + As_samp3)/3"                                      "#'END_ANSWER"                                                                     
[27] "#+"                                                                                "#'A word on comments"                                                             
[29] "#This is a comment"                                                                "#ignore #' and #+ <br/><br/>"     

因此,您应该在运行正则表达式之前加入这两行:

FileM <- paste(FileM, collapse="\n")

然后使用

FileMedit <- gsub("YOUR_ANSWER.*?END_ANSWER", "space for answer", FileM)

现在,cat(FileMedit, collapse="\n")显示

#'Rstudio environment
#'===
 
#'Top Left - scripts
#+
myfirstvariable = "Hello R"  #press control enter with cursor on line  
myfirstvariable
As_samp1 = 34
 
#'practical: create variables for arsenic concentration in 2 more samples
#+
#'space for answer
#+
#'Bottom Left - console
#+
2+2
 
#'practical: calculate average As concentration, store result in variable As_mean
#+
#'space for answer
#+
#'A word on comments
#This is a comment
#ignore #' and #+ <br/><br/>

现在,保存:

cat(FileMedit, file = "outputfileM.R")