在java中解析文本文件

时间:2014-03-31 14:44:02

标签: java text-parsing

我有一个txt文件,其中有大约900个问题,如下所示:

问题:

-----------------------------------------------------------------------------
  #0001 Which disease devastated livestock across the UK during 2001?
-----------------------------------------------------------------------------
 *Hand-and-foot
 *Foot-in-mouth
 *Hand-to-mouth
 *Foot-and-mouth

Answer: Foot-and-mouth

-----------------------------------------------------------------------------
  #0002 Which of these kills its victims by constriction?
-----------------------------------------------------------------------------
 *Andalucia
 *Anaconda
 *Andypandy
 *Annerobinson

Answer: Anaconda

我有一个存储问题的对象,以及存储答案的对象

IE: Question.java

public class Question {
    private String questionText;
    private Answer a, b, c, d;

    public Question(String questionText, Answer a, Answer b, Answer c, Answer d) {
        this.questionText = questionText;
        this.a = a;
        this.b = b;
        this.c = c;
        this.d = d;
    }

    public String getQuestionText() {
        return questionText;
    }

    public void setQuestionText(String questionText) {
        this.questionText = questionText;
    }

    public Answer getA() {
        return a;
    }

    public void setA(Answer a) {
        this.a = a;
    }

    public Answer getB() {
        return b;
    }

    public void setB(Answer b) {
        this.b = b;
    }

    public Answer getC() {
        return c;
    }

    public void setC(Answer c) {
        this.c = c;
    }

    public Answer getD() {
        return d;
    }

    public void setD(Answer d) {
        this.d = d;
    }

    public String toString() {
        return  questionText +
                "\nA) " + a +
                "\nB) " + b +
                "\nC) " + c +
                "\nD) " + d;
    }
}

Answers.Java

public class Answer {
    private String answerText;
    private boolean correct;

    //constructor to set correct answer
    public Answer(String answerText, boolean correct) {
        this.answerText = answerText;
        this.correct = correct;
    }

    public Answer(String answerText) {
        this.answerText = answerText;
        this.correct = false;
    }

    public String getAnswerText() {
        return answerText;
    }

    public void setAnswerText(String answerText) {
        this.answerText = answerText;
    }

    public boolean isCorrect() {
        return correct;
    }

    public void setCorrect(boolean correct) {
        this.correct = correct;
    }

    public String toString() {
        return answerText;
    }
}

我想创建一个数组列表,用于存储从文本文件中解析的所有问题对象。我刚接触Java并且之前主要在python中编程,并且对于如何在java中进行文本文件解析感到有些困惑,因为它看起来要复杂得多。我知道如何逐行解析或例如单词列表。我不知道如何使用文件中的额外文本。

任何帮助都将不胜感激。

两行问题的样本:

-----------------------------------------------------------------------------
  #0016 Which word follows 'North' and 'South' to give the names of two
        continents?
-----------------------------------------------------------------------------
 *Africa
 *America
 *Asia
 *Australia

Answer: America

3 个答案:

答案 0 :(得分:1)

实现一个简单的FSM并逐行解析。阅读,直到找到以#dddd开头的行,然后阅读,直至找到以-开头的行。这些界限构成了一个问题。阅读,直到找到以*开头的行,然后阅读,直至找到空白行。这些是你的选择。接下来阅读,直到找到以Answer开头的行,这是你的答案。重复。

答案 1 :(得分:1)

嗨,这里有一些东西可以解决问题;)

    String file = "text.txt";
    BufferedReader br = null;
    int nbAnswer = 4;
    try {
        br = new BufferedReader(new FileReader(file));
        String line;
        while((line = br.readLine()) != null) {   
            if( line.contains("-----------"))
            {
                line = br.readLine();
                String question = line.split("#[0-9]{4} ")[1];
                while(!(line = br.readLine()).contains("-----------"))
                    question += " " + line.trim();

                String[] answers = new String[4];

                for( int i = 0; i < nbAnswer; i++)
                    answers[i] = br.readLine().substring(2);

                br.readLine();
                String sol = br.readLine().split("Answer: ")[1];
                System.out.println(question + "\nanswer: " + answers[0] + " " + answers[1] + " " + answers[2] + " " + answers[3] + "\nsol " + sol);
            }
        }
    }
    catch(IOException ex) {
        System.err.println(ex);
    }

line.split("#[0-9]{4} ")[1];是一个正则表达式,允许您在#后跟4个数字和空格后拆分字符串。

至少它是一个好的开始;)

PS:关于做一个包含问题等的漂亮的.txt,有很多错误的事情。

  1. 难以解析
  2. 它的尺寸更大
  3. 例如,您可以将*Foot-and-mouth更改为(*)Foot-and-mouth,以表明这是答案,而不是为其添加2行;)

答案 2 :(得分:0)

如果你的每个问题在一个文件中只有10行,那么就可以逐行解析它,从每个记录的位置得到每个记录的平均值,而不是内容:

public class Parse {
    public static final int OPTION_PREFIX_LENGTH = "*".length();
    public static final int ANSWER_PREFIX_LENGTH = "Answer: ".length();
    public static final String QUESTION_SEPARATOR = "-----------------------------------------------------------------------------";

    public static void main(String[] args) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader("/Users/Marboni/tmp/test.txt"));

        try {
            while (br.ready()) {
                br.readLine();                                     // Skip separator (----).

                StringBuilder questionBuilder = new StringBuilder();
                String questionLine;
                while (!QUESTION_SEPARATOR.equals(questionLine = br.readLine())) {  // Reading lines and add them to question until separator.
                    questionBuilder.append(questionLine.trim()).append(' ');
                }
                String questionText = questionBuilder.toString().trim();
                String a = parseQuestion(br.readLine());           // Option a).
                String b = parseQuestion(br.readLine());           // Option b).
                String c = parseQuestion(br.readLine());           // Option c).
                String d = parseQuestion(br.readLine());           // Option d).
                br.readLine();                                     // Skip blank line.
                String answer = parseAnswer(br.readLine());        // Answer.

                if (br.ready()) {
                    br.readLine();                         // Skip blank line between questions, if exists.
                }

                Question question = new Question(questionText,
                        new Question.Answer(a, answer.equals(a)),
                        new Question.Answer(b, answer.equals(b)),
                        new Question.Answer(c, answer.equals(c)),
                        new Question.Answer(d, answer.equals(d))
                        );

                // Do something with it.
            }
        } finally {
            br.close();
        }
    }

    private static String parseQuestion(String record) {
        return record.trim().substring(OPTION_PREFIX_LENGTH);
    }

    private static String parseAnswer(String record) {
        return record.trim().substring(ANSWER_PREFIX_LENGTH);
    }
}