正在加载txt文件:
subtitle <- readLines('subtitle.txt')
现在,我想逐个句子遍历文本,例如:
first_sentece <- subtitle[1]
如何在R中这样做?
示例文本:
我认为这不是事实。我认为这很不幸。我认为自己和从事这些电影工作的每个人都喜欢电影,喜欢电影,喜欢看电影,喜欢在一个人满为患的电影院观看公共经历。而且我们很幸运,我们的电影院在放映电影的时候,电影院经常挤满了人,这是很特别的事情。
答案 0 :(得分:1)
仅作为正确方向的提示,strsplit
在这里可能会有所帮助:
x <- "I think that’s not true. I think it’s unfortunate. I think myself and everybody who works on these movies loves cinema, loves movies, loves going to the movies, loves to watch a communal experience in a movie theater full of people. And we’ve been very lucky that our movie theaters are often full of people when our movies play, and that’s a very special thing."
strsplit(x, "\\.\\s*")[[1]]
这将输出:
[1] "I think that’s not true"
[2] "I think it’s unfortunate"
[3] "I think myself and everybody who works on these movies loves cinema, loves movies, loves going to the movies, loves to watch a communal experience in a movie theater full of people"
[4] "And we’ve been very lucky that our movie theaters are often full of people when our movies play, and that’s a very special thing"
此答案假设句号(.
)始终表示句子的结尾。例如,如果给定的句子中有首字母缩写词或首字母缩写,那么这当然是不正确的。 J.J. Abrams makes good movies
。