我该如何解析这种文字?

时间:2013-02-11 01:09:00

标签: java regex parsing pattern-matching matcher

这是我的文字所在的格式:

15no16no17yes the parents who have older children always tell you the next stage is worse.18yes using only their hands and feet make some of the worst movies in the history of the world.19no

所以基本格式是:

number yes|no text(may/may not be there)重复

yesno之后的文字可以为空,也可以以空格开头。 (我试图在上面说明这一点)。

我为这种格式工作的代码: number yes|no重复了

要解析的更多文本示例:

30no31yesapproximately 278 billion miles from anything.32no33no34no
30no31yesapproximately 278 billion miles from anything32no33yessince the invention of call waiting34yesGravity is a contributing factor in 73 percent of all accidents involving falling objects.
35yesanybody who owns hideous clothing36yes if you take it from another person's plate37yes172 miles per hour upside down38yesonly more intelligent39yes any product including floor wax that has fat in it
35no36yestake it from another person's plate37yes172 miles per hour upside down38no39no
35no36no37yes172 miles per hour38no39no
35no36no37yesupside down38no39no

如何修改我的代码?

String regex = "^(\\d+)(yes|no)";
Pattern p = Pattern.compile(regex);

    while(input.hasNextLine()) {
     String line = input.nextLine();
        String myStr = line;
        Matcher m = p.matcher(myStr);

        while(m.find()) {
            String all = m.group();
        String digits = m.group(1);
            String bool = m.group(2);
            // do stuff
            myStr = myStr.substring(all.length());
            m.reset(myStr);
        } // end while
    } // end while

我尝试使用String regex = "^(\\d+)(yes|no)(.*)";,但问题是它会在yesno之后捕获所有内容。

我该怎么办?

PS:如果有任何不清楚的地方,请告诉我,我会提供更多解释。

1 个答案:

答案 0 :(得分:0)

试试这个。我认为它会起作用。在解析结束时,您将获得一个答案列表。现在,您只需进行一些修改即可返回此列表并使用其答案。我的算法只检测所有答案以及它们在主String中的起始位置,并使用此信息对文本进行切片。因此,该算法有两个步骤(1:边界检测,2:字符串切片)。我在代码中创建了一些组件。希望它有效。

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 *
 * @author David Buzatto
 */
public class ASimpleParser {

    public static void main( String[] args ) {
        new ASimpleParser().exec();
    }

    public void exec() {

        String[] in = {
            "30no31yesapproximately 278 billion miles from anything.32no33no34no",
            "30no31yesapproximately 278 billion miles from anything32no33yessince the invention of call waiting34yesGravity is a contributing factor in 73 percent of all accidents involving falling objects.",
            "35yesanybody who owns hideous clothing36yes if you take it from another person's plate37yes172 miles per hour upside down38yesonly more intelligent39yes any product including floor wax that has fat in it",
            "35no36yestake it from another person's plate37yes172 miles per hour upside down38no39no",
            "35no36no37yes172 miles per hour38no39no",
            "35no36no37yesupside down38no39no"
        };

        Pattern p = Pattern.compile( "(\\d+)(yes|no)" );
        List<Answer> allAnswers = new ArrayList<Answer>();

        for ( String s : in ) {

            List<Answer> answers = new ArrayList<Answer>();
            Matcher m = p.matcher( s );

            // step 1: detecting answer bounds (start)
            while ( m.find() ) {

                Answer a = new Answer();
                a.answerStart = m.group();
                a.number = m.group( 1 );
                a.yesOrNo = m.group( 2 );
                a.startAt = s.indexOf( a.answerStart );

                answers.add( a );

            }

            // step 2: slicing
            for ( int i = 0; i < answers.size(); i++ ) {

                Answer a = answers.get( i );

                // needs to compare to the right one, the will have the right bounds
                if ( i != answers.size() - 1 ) {

                    Answer rightAnswer = answers.get( i + 1 );
                    a.text = s.substring( a.startAt + a.answerStart.length(), rightAnswer.startAt );

                } else { // int the last answer, the right bound is the end of the main String. s.length() may be ommited.

                    a.text = s.substring( a.startAt + a.answerStart.length(), s.length() );

                }

            }

            allAnswers.addAll( answers );

        }

        // just iterating over the answers to show them.
        for ( Answer a : allAnswers ) {
            System.out.println( a );
        }

    }

    // a private class to contain the answers data
    private class Answer {

        String answerStart;
        String number;
        String yesOrNo;
        String text;
        int startAt;

        @Override
        public String toString() {
            return "Answer{" + "number=" + number + ", answer=" + yesOrNo + ", text=" + text + ", startAt=" + startAt + '}';
        }

    }

}