在R中提取标点符号左侧的文本

时间:2015-11-25 12:45:33

标签: regex r

我有一个字符串,想要在:的两边提取文字。当然,正则表达式和gsub在这里会有所帮助。不幸的是,我的正则表达式技能往往为零。有人可以帮忙吗?

line <- "Mrs. Villiers: I assure my right hon. Friend that I shall come to that in due course. I have great sympathy with his observation."

结果应该是两个对象(如果在多个步骤中实现它并不重要):

mrs 
"Mrs. Villiers"

text
"I assure my right hon. Friend that I shall come to that in due course. I have great sympathy with his observation."

2 个答案:

答案 0 :(得分:3)

我们可以使用strsplit

strsplit(line, ':\\s*')[[1]]
#[1] "Mrs. Villiers"                                                                                                     
#[2] "I assure my right hon. Friend that I shall come to that in due course. I have great sympathy with his observation."

或另一个选项是scan

trimws(scan(text=line, sep=':', what=''))

正如@AnandaMahto在评论中提到的,scan也有strip.white参数

scan(text=line, sep=':', what='', strip.white=TRUE)

答案 1 :(得分:2)

只需使用strsplit

strsplit(line,": ")[[1]]
# [1] "Mrs. Villiers"                                                                                                      
# [2] "I assure my right hon. Friend that I shall come to that in due course. I have great sympathy with his observation."