解析csv文件并将结果存储为JFree图表数据集消耗堆空间

时间:2015-12-10 11:19:32

标签: java csv netbeans jfreechart swingworker

我有一个Netbeans模块应用程序,在我的Netbeans IDE中执行时运行正常。 但是当我从生成的解压缩文件夹中运行分发可执行文件时,应用程序swing worker worker任务将在一段时间后停止。它循环通过几个文件,然后停止。 我最好的猜测是我必须对我处理csv文件的循环做些什么?或者......任何想法或暗示都将是最受欢迎的 文件大小为2000 - 600.000行,包含5个定义为double的时间序列。 我将数据集存储在集合中。

这是我使用while循环的方法

protected XYDataset generateDataSet(String filePath) {

    TimeSeriesCollection dataset = null;
    try {
        dataset = new TimeSeriesCollection();

        boolean isHeaderSet = false;

        String fileRow;
        StringTokenizer tokenizer;
        BufferedReader br;
        List<String> headers;
        String encoding = "UTF-8";
        br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));

        //br = new BufferedReader(new FileReader(filePath));
        if (!br.ready()) {
            throw new FileNotFoundException();
        }
        fileRow = br.readLine();

循环从这里开始

        while (fileRow != null) {

            if (!isHeaderSet) {
                headers = getHeaders(fileRow);
                for (String string : headers) {
                    dataset.addSeries(new TimeSeries(string));
                }
                isHeaderSet = true;
            }
            if (fileRow.startsWith("#")) {
                fileRow = br.readLine();
            }
            String timeStamp = null;
            String theTok1 = null;
            String theTok2;
            tokenizer = new StringTokenizer(fileRow);
            if (tokenizer.hasMoreTokens()) {
                theTok1 = tokenizer.nextToken().trim();
            }
            if (tokenizer.hasMoreTokens()) {
                theTok2 = tokenizer.nextToken().trim();
                timeStamp = theTok1 + " " + theTok2;
            }

            Millisecond m = null;

            if (timeStamp != null) {
                m = getMillisecond(timeStamp);
            }

            int serieNumber = 0;
            br.mark(201);
            if (br.readLine() == null) {
                br.reset();
                while (tokenizer.hasMoreTokens()) {
                    if (dataset.getSeriesCount() > serieNumber) {
                        dataset.getSeries(serieNumber).add(m, parseDouble(tokenizer.nextToken().trim()), true);

最后一行代码行abowe,我在最后一个scv文件行上将notifyer设置为true,否则每次添加新系列时它都会循环访问数据集,并且它足以在最后一行执行此操作。

                    } else {
                        tokenizer.nextToken();
                    }
                    serieNumber++;
                }
            } else {
                br.reset();
                while (tokenizer.hasMoreTokens()) {
                    if (dataset.getSeriesCount() > serieNumber) {
                        dataset.getSeries(serieNumber).add(m, parseDouble(tokenizer.nextToken().trim()), false);

                    } else {
                        tokenizer.nextToken();
                    }
                    serieNumber++;
                }
            }
            fileRow = br.readLine();
        }
        br.close();
    } catch (FileNotFoundException ex) {
        printStackTrace(ex);
    } catch (IOException | ParseException ex) {
        printStackTrace(ex);
    }
    return dataset;
}

以下是处理从上面代码调用的heders和timestamp时使用的方法。 (有时csv文件错过了标题)

/**
 * If the start cahr "#" is missing then the headers will all be "NA".
 *
 * @param fileRow a row with any numbers of headers,
 * @return ArrayList with headers
 */
protected List<String> getHeaders(String fileRow) {
    List<String> returnValue = new ArrayList<>();
    StringTokenizer tokenizer;
    if (fileRow.startsWith("#")) {
        tokenizer = new StringTokenizer(fileRow.substring(1));
    } else {
        tokenizer = new StringTokenizer(fileRow);
        tokenizer.nextToken();
        tokenizer.nextToken();//date and time is one header but two tokens
        while (tokenizer.hasMoreTokens()) {
            returnValue.add("NA");
            tokenizer.nextToken();
        }
        return returnValue;
    }
    tokenizer.nextToken();
    while (tokenizer.hasMoreTokens()) {
        returnValue.add(tokenizer.nextToken().trim());
    }
    return returnValue;
}

/**
 * @param fileRow must match pattern "yyyy-MM-dd HH:mm:ss.SSS"
 * @return
 * @throws ParseException
 */
public Millisecond getMillisecond(String timeStamp) throws ParseException {
    Date date = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").parse(timeStamp);
    return new Millisecond(date);
}

1 个答案:

答案 0 :(得分:1)

假设您从generateDataSet()的实施中调用doInBackground(),对dataset的更改通常会在后台线程上触发事件,违反了Swing的single thread rule。相反,publish() interim resultsprocess()显示为here