我有一个数据框,其中包含遍布整个句子的部分内容,在某些情况下,还包含数据框的多行。
例如,head(mydataframe)
返回
# 1 Do you have any idea what
# 2 they were arguing about?
# 3 Do--Do you speak
# 4 English?
# 5 yeah.
# 6 No, I'm sorry.
假设一个句子可以被
终止“”。要么 ”?”要么 ”!”或“......”
是否有任何R库函数能够输出以下内容:
# 1 Do you have any idea what they were arguing about?
# 2 Do--Do you speak English?
# 3 yeah.
# 4 No, I'm sorry.
答案 0 :(得分:4)
这适用于以.
...
?
或!
x <- paste0(foo$txt, collapse = " ")
trimws(unlist(strsplit(x, "(?<=[?.!|])(?=\\s)", perl=TRUE)))
@AvinashRaj对lookbehind
指针的认可给出了:
#[1] "Do you have any idea what they were arguing about?"
#[2] "Do--Do you speak English?"
#[3] "yeah..."
#[4] "No, I'm sorry."
数据强>
我修改了玩具数据集,以包含一个字符串以...
结尾的情况(根据OP的要求)
foo <- data.frame(num = 1:6,
txt = c("Do you have any idea what", "they were arguing about?",
"Do--Do you speak", "English?", "yeah...", "No, I'm sorry."),
stringsAsFactors = FALSE)
答案 1 :(得分:3)
这是我得到的。我相信有更好的方法可以做到这一点。在这里我使用了基本功能。我创建了一个名为txt
的示例数据框。首先,我创建了一个包含toString()
中所有文本的字符串。 ,
添加了gsub()
,因此我在第一个gsub()
中删除了它们。然后,我在第二个strsplit()
中处理了空白区域(超过2个空格)。然后,我按您指定的分隔符拆分字符串。将Tyler Rinker称为translation units,我设法在foo <- data.frame(num = 1:6,
txt = c("Do you have any idea what", "they were arguing about?",
"Do--Do you speak", "English?", "yeah.", "No, I'm sorry."),
stringsAsFactors = FALSE)
library(magrittr)
toString(foo$txt) %>%
gsub(pattern = ",", replacement = "", x = .) %>%
strsplit(x = ., split = "(?<=[?.!])", perl = TRUE) %>%
lapply(., function(x)
{gsub(pattern = "^ ", replacement = "", x = x)
}) %>%
unlist
#[1] "Do you have any idea what they were arguing about?"
#[2] "Do--Do you speak English?"
#[3] "yeah."
#[4] "No I'm sorry."
留下分隔符。最后的工作是删除句子初始位置的空格。然后,取消列表。
修改强> StevenBeaupré修改了我的代码。这是要走的路!
RewriteCond %{HTTP_HOST} ^(.*)\.domain2\.com
RewriteRule ^(.*)$ http://{%1.}domain2.com/$1 [L,NC,QSA]