改进解析文本文件的代码

时间:2012-05-18 19:07:39

标签: java file parsing file-io

文本文件(前三行易于阅读,后三行以p开头)

ThreadSize:2
ExistingRange:1-1000
NewRange:5000-10000
p:55 - AutoRefreshStoreCategories  Data:Previous    UserLogged:true    Attribute:1    Attribute:16      Attribute:2060  
p:25 - CrossPromoEditItemRule      Data:New         UserLogged:false     Attribute:1      Attribute:10107   Attribute:10108
p:20 - CrossPromoManageRules       Data:Previous    UserLogged:true      Attribute:1      Attribute:10107   Attribute:10108

下面是我编写的用于解析上述文件的代码,在解析之后,我使用其Setter设置相应的值。我只是想知道我是否可以通过使用RegEx之类的其他方式在解析和其他方面更多地改进这些代码?我的主要目标是解析它并设置相应的值。任何反馈或建议都将受到高度赞赏。

private List<Command> commands;
private static int noOfThreads = 3;
private static int startRange = 1;
private static int endRange = 1000;
private static int newStartRange = 5000;
private static int newEndRange = 10000;
private BufferedReader br = null;
private String sCurrentLine = null;
private int distributeRange = 100;
private List<String> values = new ArrayList<String>();
private String commandName;
private static String data;
private static boolean userLogged;
private static List<Integer> attributeID =  new ArrayList<Integer>();

    try {
        // Initialize the system
        commands = new LinkedList<Command>();
        br = new BufferedReader(new FileReader("S:\\Testing\\Test1.txt"));

        while ((sCurrentLine = br.readLine()) != null) {
            if(sCurrentLine.contains("ThreadSize")) {
                noOfThreads = Integer.parseInt(sCurrentLine.split(":")[1]);
            } else if(sCurrentLine.contains("ExistingRange")) {
                startRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
                endRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
            } else if(sCurrentLine.contains("NewRange")) {
                newStartRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
                newEndRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
            } else {
                allLines.add(Arrays.asList(sCurrentLine.split("\\s+")));
                String key = sCurrentLine.split("-")[0].split(":")[1].trim();
                String value = sCurrentLine.split("-")[1].trim();
                values = Arrays.asList(sCurrentLine.split("-")[1].trim().split("\\s+"));
                for(String s : values) {
                    if(s.contains("Data:")) {
                        data = s.split(":")[1];
                    } else if(s.contains("UserLogged:")) {
                        userLogged = Boolean.parseBoolean(s.split(":")[1]);
                    } else if(s.contains("Attribute:")) {
                        attributeID.add(Integer.parseInt(s.split(":")[1]));
                    } else {
                        commandName = s;
                    }
                }

                Command command = new Command();
                command.setName(commandName); 
                command.setExecutionPercentage(Double.parseDouble(key));
                command.setAttributeID(attributeID);
                command.setDataCriteria(data);
                command.setUserLogging(userLogged);
                commands.add(command);

            }
        }
    } catch(Exception e) {
        System.out.println(e);
    }

4 个答案:

答案 0 :(得分:1)

我认为你应该知道在使用RegEx时你究竟期待什么。 http://java.sun.com/developer/technicalArticles/releases/1.4regex/应该会有帮助。

答案 1 :(得分:0)

您可以使用Scanner课程。它有一些帮助方法来读取文本文件

答案 2 :(得分:0)

回答评论:

p:55 - AutoRefreshStoreCategories  Data:Previous    UserLogged:true    Attribute:1    Attribute:16      Attribute:2060  

用正则表达式解析上面(和Attribute: 3次):

String parseLine = "p:55 - AutoRefreshStoreCategories  Data:Previous    UserLogged:true    Attribute:1    Attribute:16      Attribute:2060";
    Matcher m = Pattern
            .compile(
                    "p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)")
            .matcher(parseLine);
    if(m.find()) {
        int p = Integer.parseInt(m.group(1));
        String method = m.group(2);
        String data = m.group(3);
        boolean userLogged = Boolean.valueOf(m.group(4));
        int at1 = Integer.parseInt(m.group(5));
        int at2 = Integer.parseInt(m.group(6));
        int at3 = Integer.parseInt(m.group(7));
        System.out.println(p + " " + method + " " + data + " " + userLogged + " " + at1 + " " + at2 + " "
                + at3);
    }

编辑查看您的评论,您仍然可以使用正则表达式:

String parseLine = "p:55 - AutoRefreshStoreCategories  Data:Previous    UserLogged:true    "
            + "Attribute:1    Attribute:16      Attribute:2060";
    Matcher m = Pattern.compile("p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)").matcher(
            parseLine);
    if(m.find()) {
        for(int i = 0; i < m.groupCount(); ++i) {
            System.out.println(m.group(i + 1));
        }
    }

    Matcher m2 = Pattern.compile("Attribute:(\\d+)").matcher(parseLine);
    while(m2.find()) {
        System.out.println("Attribute matched: " + m2.group(1));
    }

但这取决于在“真实”属性之前是否为Attribute:名称(例如作为方法名称 - 在p之后)

答案 3 :(得分:0)

我会把它翻出来。现在你是:

  1. 扫描关键字的行:如果找不到整行,这是通常的情况,因为您要处理多个关键字,并且它们不会全部出现在每一行。
  2. 再次扫描整行以获取“:”并在所有出现时将其拆分
  3. 主要将':'之后的部分解析为整数,或偶尔解析为范围。
  4. 对每一行进行几次完整扫描。除非文件有数十亿行,否则这本身并不是一个问题,但它表明你已经将处理工作重新开始。