分割字符串以换行和句号

时间:2019-01-23 17:20:28

标签: r

我想使用stringi包按句子拆分文本。

我有以下句子:

OESOPHAGUS: Everything looks great , no problems. Residual Things's (27-34cm). Everything was washed. Treated with a load of great stuff. 8 various pads were applied.Total of  48 pads given . No immediate complications\nSTOMACH: Normal

我想按句子将其拆分如下:

[1] OESOPHAGUS: Everything looks great , no problems. 
[2] Residual Things's (27-34cm)
[3] Everything was washed
[4] Treated with a load of great stuff
[5] 8 various pads were applied
[6] Total of  48 pads given
[7] No immediate complications
[8] STOMACH: Normal

但是,当我使用stringi :: stri_split_lines时,我得到了:

[1] OESOPHAGUS: Everything looks great , no problems. Residual Things's (27-34cm). Everything was washed. Treated with a load of great stuff. 8 various pads were applied.Total of  48 pads given . No immediate complication

[2] STOMACH: Normal

我很欣赏它被\ n分割,但是如何强制它被\ n或句号分割-这肯定是一个常见问题?

1 个答案:

答案 0 :(得分:4)

您可以在字符串'\\.|\n'后面加上,

dat <- "OESOPHAGUS: Everything looks great , no problems. Residual Things's (27-34cm). Everything was washed. Treated with a load of great stuff. 8 various pads were applied.Total of  48 pads given . No immediate complications\nSTOMACH: Normal"

> strsplit(dat, '\\.|\n') ## base R
> stringi::stri_split_regex(dat, '\\.|\n') ## from stringi package
> stringr::str_split(dat, '\\.|\n') ## from stringr package

[[1]]
[1] "OESOPHAGUS: Everything looks great , no problems"
[2] " Residual Things's (27-34cm)"                    
[3] " Everything was washed"                          
[4] " Treated with a load of great stuff"             
[5] " 8 various pads were applied"                    
[6] "Total of  48 pads given "                        
[7] " No immediate complications"                     
[8] "STOMACH: Normal"