Question

是否可以在R中使用gsub调用dotall表达式？基本上，我试图提取跨越多行的文本部分。请考虑以下示例：

eg.df <- c("----------", " ", "keep", " ", "keep this too", " ", "----------", " ", 
   "Delete this line and everything after", "Delete this one too", 
   " ", "And delete this one")

我想使用第7-9行作为匹配的模式。我想删除这些行和随后的所有内容，直到文件结束。

[1] "----------"                           
[2] " "                                    
[3] "keep"                                 
[4] " "                                    
[5] "keep this too"                        
[6] " "                                    
[7] "----------"                           
[8] " "                                    
[9] "Delete this line and everything after"
[10] "Delete this one too"                  
[11] " "                                    
[12] "And delete this one"

因此，结果输出为：

[1] "----------"                           
[2] " "                                    
[3] "keep"                                 
[4] " "                                    
[5] "keep this too"                        
[6] " "

Answer 1

你可以尝试

  strsplit(sub('-+, +,[A-Za-z]+[^-]+$', '', 
         paste(eg.df, collapse= ',')), ',')[[1]]
  #[1] "----------"    " "             "keep"          " "            
  #[5] "keep this too" " "

或@hwnd评论，

  strsplit(sub('-+[^-]+\\z', '', paste(eg.df, collapse = '_'), 
                      perl=T), '_')[[1]]

在R（dotall）中跨越多行的gsub

1 个答案: