正则表达式分隔文件的行

时间:2014-04-23 17:59:16

标签: java regex

Lesson no 1
  lesson name: Jack and Jill went to America
  lesson contents: some XXXXX XXXXX contents
  lesson Description:  jack and jill lesson description



Lesson no 2
  lesson name: Lorem ipsum dolor sit amet
  lesson contents: consectetur adipisicing elit, sed do eiusmod tempor
  lesson Description:  Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

我有一个类似上面内容的文件,我想使用正则表达式来处理并使用java转换为json对象。 任何人都可以建议正则表达式来处理和分离“课程名称”,“课程内容”,课程描述“等等?

我希望输出看起来像这样:

[{"Lesson no":"1","lesson name":"xxx","lesson contents":"YYY","Lesson Desc":"zzzz"},{....}]

3 个答案:

答案 0 :(得分:1)

要使用正则表达式,您必须确保文件的常量结构。在这里,我将课程分为2行,包括在最后一课后。您可以在读取文件后以编程方式附加这些行,或者在它们之间仅使用1个空行等。

Lesson no 1
lesson name: Jack and Jill went to America
lesson contents: some XXXXX XXXXX contents with new
lines
lesson Description: jack and Jill lesson description with new
lines


Lesson no 2
lesson name: Lorem ipsum dolor sit amet
lesson contents: consectetur adipisicing elit, sed do eiusmod tempor
lesson Description:  Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
<this is an empty line>
<this is an empty line>

代码重复匹配单个课程的结构并将其分解为组件。如果您的输入文件发生更改,则需要相应地更改pattern变量。

注意:在Java 8上,您根本不需要lb字符串,请将其替换为&#34; \\ R&#34;。

public class Regex {

    static String lb = System.getProperty("line.separator");
    static String path = "src/test/text.txt";
    static String pattern = "(Lesson no) (.+?)"+lb+"(lesson name): (.+?)"+lb+"(lesson contents): (.+?)"+lb+"(lesson Description): (.+?)"+lb+lb;

    public static void main(String[] args) {

        String text = null;
        try {
            text = new Scanner(new File(path)).useDelimiter("\\z").next();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        Pattern pat = Pattern.compile(pattern, Pattern.DOTALL);
        Matcher m = pat.matcher(text);

        StringBuilder sb = new StringBuilder("[");
        while (m.find()) {
            sb.append("{");
            for (int i = 1; i <= m.groupCount(); i++) {
                sb.append("\"").append(m.group(i));
                if (i%2 == 0)
                    sb.append("\",");
                else
                    sb.append("\":");
            }
            sb.deleteCharAt(sb.length()-1).append("},");
        }
        sb.deleteCharAt(sb.length()-1).append("]");
        System.out.println(sb.toString());
    }
}

输出

[{"Lesson no":"1","lesson name":"Jack and Jill went to America","lesson contents":"some XXXXX XXXXX contents with new
lines","lesson Description":"jack and Jill lesson description with new
lines"},{"Lesson no":"2","lesson name":"Lorem ipsum dolor sit amet","lesson contents":"consectetur adipisicing elit, sed do eiusmod tempor","lesson Description":" Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor"}]

答案 1 :(得分:0)

让我们从搜索课程名称开始。更具体地说,我们希望找到紧跟在&#34;课程名称:&#34;之后但在换行符之前的内容。这是一个正则表达式:

lesson name:\s*(.*)

如您所见,它会搜索&#34;课程名称:&#34;。然后 \ s 匹配任何空格字符,而星形意味着0次或更多次。这很好,因为如果你在#34;课程名称之后不小心没有空格或者可能有两个空格:&#34;它仍会抓住它。

最后,该点匹配任何字符,除了NEWLINE。星形意味着0次或更多次,所以我们匹配线的其余部分。因为该部分在括号中,所以它存储在Java中的变量中。

我对Java不太熟悉,但我认为你必须得到这样的匹配(如果有人看到错误请告诉我)....

Pattern regex = Pattern.compile('lesson name:\s*(.*)');
Matcher m = regex.matcher(yourfilestring);
if(m.find()){
  System.out.println(m.group(1)); // 1 is for the first set of parenthesis in the regex
}

您可以应用此概念来完成以下事项&#34;课程编号&#34;,&#34;课程内容:&#34;和&#34;课程描述:&#34;太

答案 2 :(得分:0)

试试此代码

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.StringTokenizer;


public class test {

    /**
     * @param args
     */
    public static void main(String[] args) {
        try {
            Scanner scanner = new Scanner(new File("test.txt"));
            while (scanner.hasNext()){
                String string = scanner.useDelimiter("\n").next();
                if(string.contains("Lesson no")){
                    System.out.print(string );
                }
                else{
                    if(!string.equals(" ") && !string.equals("\n") && !string.equals("") && !string.equals("\r")){
                        if(string.contains(":")){
                            StringTokenizer st = new StringTokenizer(string,":");
                            String key ="";
                            String value = "";

                            while (st.hasMoreElements()) {
                                key = st.nextElement().toString();
                                value = st.nextElement().toString();

                                System.out.print(" " +key +" : "+ value); 
                            }
                        }

                        else{
                            System.out.println(string);
                        }

                    }//if(!string.equals(" "))
                }


            }//while
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
}

我希望可以提供帮助