我想使用stringi
包按句子拆分文本。
我有以下句子:
OESOPHAGUS: Everything looks great , no problems. Residual Things's (27-34cm). Everything was washed. Treated with a load of great stuff. 8 various pads were applied.Total of 48 pads given . No immediate complications\nSTOMACH: Normal
我想按句子将其拆分如下:
[1] OESOPHAGUS: Everything looks great , no problems.
[2] Residual Things's (27-34cm)
[3] Everything was washed
[4] Treated with a load of great stuff
[5] 8 various pads were applied
[6] Total of 48 pads given
[7] No immediate complications
[8] STOMACH: Normal
但是,当我使用stringi :: stri_split_lines时,我得到了:
[1] OESOPHAGUS: Everything looks great , no problems. Residual Things's (27-34cm). Everything was washed. Treated with a load of great stuff. 8 various pads were applied.Total of 48 pads given . No immediate complication
[2] STOMACH: Normal
我很欣赏它被\ n分割,但是如何强制它被\ n或句号分割-这肯定是一个常见问题?
答案 0 :(得分:4)
您可以在字符串'\\.|\n'
后面加上,
dat <- "OESOPHAGUS: Everything looks great , no problems. Residual Things's (27-34cm). Everything was washed. Treated with a load of great stuff. 8 various pads were applied.Total of 48 pads given . No immediate complications\nSTOMACH: Normal"
> strsplit(dat, '\\.|\n') ## base R
> stringi::stri_split_regex(dat, '\\.|\n') ## from stringi package
> stringr::str_split(dat, '\\.|\n') ## from stringr package
[[1]]
[1] "OESOPHAGUS: Everything looks great , no problems"
[2] " Residual Things's (27-34cm)"
[3] " Everything was washed"
[4] " Treated with a load of great stuff"
[5] " 8 various pads were applied"
[6] "Total of 48 pads given "
[7] " No immediate complications"
[8] "STOMACH: Normal"