使用r中的分隔符数组拆分字符串

时间:2015-04-22 05:39:57

标签: regex r

我是R.的新手。我必须根据短语分隔符分割一个句子。我们可以使用strsplit根据一个分隔符拆分字符串。我想根据分隔符的数量来分割字符串,例如[,. :; ]。我怎么能一步到位呢。是否有适用于此的正则表达式?

例如:

my_string = "This is a sentence.  This is a question, right?  Yes!  It is."

预期产出:

"This is a sentence", "This is a question", "right", "yes", "It is"

2 个答案:

答案 0 :(得分:4)

您可以使用:

strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence"  " This is a question" " right"             
#[4] " Yes"                " It is"

为了摆脱这些额外的空间,你可以这样做:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
         "\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"             
#[4] "Yes"                "It is"

正如thelatemail指出的那样,这更简单:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
     "[,.:;?!]\\s*")  # \\s* represents a space character appearing 0 or more times

您需要转义某些字符,否则这些字符会被解释为元字符。这就是为什么您在\\.前面看到?的原因。 |是一种"或"。

答案 1 :(得分:1)

您可以使用此模式获取输出

        string input = @"This is a sentence. This is a question, right? Yes! It is.";
        string pattern = @"[, . : ; ]";

        foreach (string result in Regex.Split(input, pattern))
        {
            Console.WriteLine("'{0}'", result);
        }

请查看控制台是否获得了正确的结果。