Question

我有一个数据框，其中包含遍布整个句子的部分内容，在某些情况下，还包含数据框的多行。

例如，head(mydataframe)返回

#  1 Do you have any idea what
#  2  they were arguing about?
#  3          Do--Do you speak
#  4                  English?
#  5                     yeah.
#  6            No, I'm sorry.

假设一个句子可以被

终止

“”。要么 ”？”要么 ”！”或“......”

是否有任何R库函数能够输出以下内容：

#  1 Do you have any idea what they were arguing about?
#  2          Do--Do you speak English?
#  3                     yeah.
#  4            No, I'm sorry.

Answer 1

这适用于以. ... ?或!

结尾的所有句子

x <- paste0(foo$txt, collapse = " ")
trimws(unlist(strsplit(x, "(?<=[?.!|])(?=\\s)", perl=TRUE)))

@AvinashRaj对lookbehind

指针的认可

给出了：

#[1] "Do you have any idea what they were arguing about?"
#[2] "Do--Do you speak English?"                         
#[3] "yeah..."                                           
#[4] "No, I'm sorry."

数据

我修改了玩具数据集，以包含一个字符串以...结尾的情况（根据OP的要求）

foo <- data.frame(num = 1:6, txt = c("Do you have any idea what", "they were arguing about?", "Do--Do you speak", "English?", "yeah...", "No, I'm sorry."), stringsAsFactors = FALSE)

Answer 2

这是我得到的。我相信有更好的方法可以做到这一点。在这里我使用了基本功能。我创建了一个名为txt的示例数据框。首先，我创建了一个包含toString()中所有文本的字符串。 ,添加了gsub()，因此我在第一个gsub()中删除了它们。然后，我在第二个strsplit()中处理了空白区域（超过2个空格）。然后，我按您指定的分隔符拆分字符串。将Tyler Rinker称为translation units，我设法在foo <- data.frame(num = 1:6, txt = c("Do you have any idea what", "they were arguing about?", "Do--Do you speak", "English?", "yeah.", "No, I'm sorry."), stringsAsFactors = FALSE) library(magrittr) toString(foo$txt) %>% gsub(pattern = ",", replacement = "", x = .) %>% strsplit(x = ., split = "(?<=[?.!])", perl = TRUE) %>% lapply(., function(x) {gsub(pattern = "^ ", replacement = "", x = x) }) %>% unlist #[1] "Do you have any idea what they were arguing about?" #[2] "Do--Do you speak English?" #[3] "yeah." #[4] "No I'm sorry."留下分隔符。最后的工作是删除句子初始位置的空格。然后，取消列表。

修改 StevenBeaupré修改了我的代码。这是要走的路！

RewriteCond %{HTTP_HOST} ^(.*)\.domain2\.com RewriteRule ^(.*)$ http://{%1.}domain2.com/$1 [L,NC,QSA]

在R数据帧中组合碎片句

2 个答案: