在标签之间获取子串

时间:2011-07-22 17:57:02

标签: java string parsing split

我已经阅读了几个关于按标签解析字符串的问题,但是我没有找到针对我的具体问题的exaact答案。 问题:我有一大行文本。我需要根据标记将此字符串解析为多个字符串。例如:我找到 [tag] 然后我读取文本直到[tag]并将其转换为新字符串。然后我会在显示相同的 [tag] 之前阅读文本,并将此数据发布到新字符串,依此类推。

实施例: [tag] Lorem Ipsum [tag] 只是印刷和排版行业的虚拟文本。自16世纪以来,Lorem Ipsum一直是业界标准的虚拟文本,当时一台未知的打印机采用了类型的厨房,并将其拼凑成一本类型的样本。 [tag] [tag] 不仅存活了五个世纪,而且还延续了电子排版,基本保持不变。它在20世纪60年代推出了包含Lorem Ipsum段落的Letraset表格,最近还发布了包括Lorem Ipsum版本在内的桌面出版软件Aldus PageMaker。

我想要基于这个文本的三个字符串:Lorem Ipsum,它有文本

3 个答案:

答案 0 :(得分:1)

Regular expressions to the rescue!

LinkedList<String> matches = new LinkedList<String>();
Pattern pattern = Pattern.compile("\\[tag\\].*?\\[tag\\]");
Matcher matcher = pattern.matcher(str);

while(matcher.find())
    matches.add(matcher.group());

或者你可以手动浏览字符串。

int index = -1;

while( str.indexOf("[tag]",index+1) != -1 ) {
    String s = str.substring( index = str.indexOf("[tag]",index+1)+5, index = str.indexOf("[tag]",index) );
    System.out.println(s);
}

答案 1 :(得分:1)

String txt = "[tag] Lorem Ipsum [tag] is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. [tag] It has [tag] survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";

int index = -1;
while (true)
{
    int i = txt.indexOf("[tag]", index+1);
    if (i == -1) break;
    if (index == -1)
    {
        index = i;
    } else
    {
        System.out.println(txt.substring(index + 5, i));
        index = i;
    }

}

答案 2 :(得分:0)

使用String类的split方法。它期望正则表达式作为参数:

String allText = "some[tag]text[tag]separated[tag]by tags";
String[] textBetweenTags = allText.split("\\[tag\\]");
for (int i = 0; i < textBetweenTags.length; i++) {
    System.out.println(textBetweenTags[i]);
}