如何根据R中的条件分割字符串?

时间:2019-04-24 14:01:34

标签: r regex gsub

我想通过仅在'>'和'<'之间存在单词'split here'来将单个字符串拆分为多个字符串,并且不删除单词'split here'以外的其他任何字符< / p>

text <- c("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >")

预期输出:

text[1] = "Don't split here > yes here "
text[2] = "and blah blah < again don't (anything could be here) split here >"

我尝试过

gsub(">(.*split here.*)<","", text)

但这似乎不起作用。有人可以用正则表达式exp。在这里帮我吗?

3 个答案:

答案 0 :(得分:6)

用\ 1替换所需的字符串,然后在\ 1上分割:

    public enum Taste
    {
        Sweet = 1,
        Sour = 2,
        Tangy = 3,
        Savory = 4,
    }

    public class Fruit
    {
        public int FruitId { get; set; }
        public string Name { get; set; }
        public HashSet<Taste> Tastes { get; set; }
    }

    public static GetAppleFromDatabase()
    {

        //Calling this...
        ApplicationDbContext.Fruits.FirstOrDefault(a => a.FruitId == 9);

        //Would give an object that looks like this...
        var anApple = new Fruit()
        {
            FruitId = 9,
            Name = "Apple",
            Tastes = new HashSet<Taste>()
            {
                Taste.Sweet,
                Taste.Tangy,
            }
        };
    }

如果输入是字符向量,则输出将是列表,或者如果您想使其扁平化,只需使用dex = tree.xpath('//div[@class="cd-timeline-topic"]/text()') names = filter(lambda n: n.strip(), dex) table = str.maketrans(dict.fromkeys('?:,')) for index, name in enumerate(dex, start = 0): print('{}.{}'.format(index, name.strip().translate(table))) ,其中strsplit(gsub("(>[^<]+) split here ([^<]+<)", "\\1\1\\2", text), "\1") ## [[1]] ## [1] "Don't split here > yes here" ## [2] "and blah blah < again don't split here >" 是上述代码行的结果。

答案 1 :(得分:1)

您可以使用此正则表达式使用简单的strsplit,并使用\K(使用perl = TRUE)运算符为您提供所需的字符串。

>[^>]*?\Ksplit here\s*(?=[^<]*<)

Regex Demo

R Code demo

strsplit("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >", ">[^>]*?\\Ksplit here\\s*(?=[^<]*<)", perl=TRUE)

打印

[[1]]
[1] "Don't split here > yes here "                                     
[2] "and blah blah < again don't (anything could be here) split here >"

答案 2 :(得分:0)

您也可以这样做-

 > str_split(gsub(str_extract(text,"(?<=>).*?(?=\\<)"),gsub("split here","nsplit here",str_extract(text,"(?<=>).*?(?=\\<)")),text),"nsplit here")

输出-

[[1]]
[1] "Don't split here > yes here "                                      
    " and blah blah < again don't (anything could be here) split here >"