如何逐句遍历txt文件?

时间:2019-11-11 12:52:52

标签: r

正在加载txt文件:

subtitle <- readLines('subtitle.txt')

现在,我想逐个句子遍历文本,例如:

first_sentece <- subtitle[1]

如何在R中这样做?

示例文本:

  

我认为这不是事实。我认为这很不幸。我认为自己和从事这些电影工作的每个人都喜欢电影,喜欢电影,喜欢看电影,喜欢在一个人满为患的电影院观看公共经历。而且我们很幸运,我们的电影院在放映电影的时候,电影院经常挤满了人,这是很特别的事情。

1 个答案:

答案 0 :(得分:1)

仅作为正确方向的提示,strsplit在这里可能会有所帮助:

x <- "I think that’s not true. I think it’s unfortunate. I think myself and everybody who works on these movies loves cinema, loves movies, loves going to the movies, loves to watch a communal experience in a movie theater full of people. And we’ve been very lucky that our movie theaters are often full of people when our movies play, and that’s a very special thing."
strsplit(x, "\\.\\s*")[[1]]

这将输出:

[1] "I think that’s not true"                                                                                                                                                             
[2] "I think it’s unfortunate"                                                                                                                                                            
[3] "I think myself and everybody who works on these movies loves cinema, loves movies, loves going to the movies, loves to watch a communal experience in a movie theater full of people"
[4] "And we’ve been very lucky that our movie theaters are often full of people when our movies play, and that’s a very special thing"          

此答案假设句号(.)始终表示句子的结尾。例如,如果给定的句子中有首字母缩写词或首字母缩写,那么这当然是不正确的。 J.J. Abrams makes good movies