str_extract特定模式

时间:2018-06-10 16:40:35

标签: r stringr

我正在尝试从文本中提取具有相同模式的字符串

The Tragedy of Romeo and Juliet by William Shakespeare

library(readr)

txt <- read_file('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')

文字示例:

  

场景I. \ r \ nVerona。一个公共场所。\ r \ n \ r \ n输入Sampson和Gregory   (带剑和扣环)的房子\ r \ n of Capulet   ...
  场景二。\ r \ n街。\ r \ n \ r \ n进入巴黎县的Capulet和[仆人]    - 小丑。\ r \ n \ r \ n \ r \ n Cap。

我想提取

  

维罗纳。公共场所。
  一条街

我试过

library(stringr)

str_extract(txt, "Scene\\s[IV]+\\.\\s\\s\\b[A-Z]+\\b")

它不起作用。

提前感谢您的建议。

1 个答案:

答案 0 :(得分:1)

str_extract_all(gsub("(Scene.*?)\r\n","\\1 ",txt),"Scene.*")
[[1]]
 [1] "Scene I. Verona. A public place."                                    
 [2] "Scene II. A Street."                                                 
 [3] "Scene III. Capulet's house."                                         
 [4] "Scene IV. A street."                                                 
 [5] "Scene V. Capulet's house."                                           
 [6] "Scene I. A lane by the wall of Capulet's orchard."                   
 [7] "Scene II. Capulet's orchard."                                        
 [8] "Scene III. Friar Laurence's cell."                                   
 [9] "Scene IV. A street."                                                 
[10] "Scene V. Capulet's orchard."                                         
[11] "Scene VI. Friar Laurence's cell."                                    
[12] "Scene I. A public place."                                            
[13] "Scene II. Capulet's orchard."                                        
[14] "Scene III. Friar Laurence's cell."                                   
[15] "Scene IV. Capulet's house"                                           
[16] "Scene V. Capulet's orchard."                                         
[17] "Scene I. Friar Laurence's cell."                                     
[18] "Scene II. Capulet's house."                                          
[19] "Scene III. Juliet's chamber."                                        
[20] "Scene IV. Capulet's house."                                          
[21] "Scene V. Juliet's chamber."                                          
[22] "Scene I. Mantua. A street."                                          
[23] "Scene II. Verona. Friar Laurence's cell."                            
[24] "Scene III. Verona. A churchyard; in it the monument of the Capulets."