我是R.的新手。我必须根据短语分隔符分割一个句子。我们可以使用strsplit根据一个分隔符拆分字符串。我想根据分隔符的数量来分割字符串,例如[,. :; ]。我怎么能一步到位呢。是否有适用于此的正则表达式?
例如:
my_string = "This is a sentence. This is a question, right? Yes! It is."
预期产出:
"This is a sentence", "This is a question", "right", "yes", "It is"
答案 0 :(得分:4)
您可以使用:
strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence" " This is a question" " right"
#[4] " Yes" " It is"
为了摆脱这些额外的空间,你可以这样做:
strsplit("This is a sentence. This is a question, right? Yes! It is.",
"\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"
#[4] "Yes" "It is"
正如thelatemail指出的那样,这更简单:
strsplit("This is a sentence. This is a question, right? Yes! It is.",
"[,.:;?!]\\s*") # \\s* represents a space character appearing 0 or more times
您需要转义某些字符,否则这些字符会被解释为元字符。这就是为什么您在\\
和.
前面看到?
的原因。 |
是一种"或"。
答案 1 :(得分:1)
您可以使用此模式获取输出
string input = @"This is a sentence. This is a question, right? Yes! It is.";
string pattern = @"[, . : ; ]";
foreach (string result in Regex.Split(input, pattern))
{
Console.WriteLine("'{0}'", result);
}
请查看控制台是否获得了正确的结果。