Question

我正在尝试使用opencsv的CSVWriter编写的CSV文件进行批量处理： CSVWriter writer = new CSVWriter（new FileWriter（filePath + createFileName），'，'，CSVWriter.DEFAULT_QUOTE_CHARACTER）;

和BufferedReader一起读取写入的文件。编写Csv文件，我认为读取操作也很顺利。所以，它的工作性很好。但是当我选择使用相同的操作将特定数据写入Csv时，批处理的创建就会出错。一个异常即将发出声明“无法解析CSV。发现未转义的报价。带引号的值应该在引号内”，这使得应用程序不会以预期的方式运行。

在经历此错误之后，似乎数据中存在一些“”（双引号）或“（双引号）”符号。（我的数据形式为“asdf”，“1.0”，“”， “高清”）。据我所知，我试图应用正则表达式找到双引号但找不到任何，因为在检查文件后它不包含重复的双引号。我遵循的链接是：Regular expression to find and replace unescaped Non-successive double quotes in CSV file

此后在代码中，我正在使用：File tmpFile = File.createTempFile（“bulkAPIInsert”，“。csv”）;将数据保存在临时文件中，然后将其删除。

用以下内容替换上面的代码后，我以某种方式处理了即将发生的异常，但它进一步导致另一个说“无法解析CSV。在关闭打开的引用之前已达到EOF”。文件tmpFile =新文件（“bulkAPIInsert.csv”）;

我认为不应遵循上述解决方法，因为它会影响应用程序的性能问题。

通过访问CSVReader类，我发现了一个定义的自定义异常，说明与我得到的完全相同的异常。但我认为在一些双qoute（CSV文件的单元格值）中找到双引号时会出现这种情况。我将链接称为：https://github.com/mulesoft/salesforce-connector/blob/master/src/main/java/com/sforce/async/CSVReader.java

有人可以建议我在哪里做错了或解决这个问题吗？

我正在将您的代码段分享为：方法1然后调用Method2。

    Method1: private List<BatchInfo> createBatchesFromCSVFile(RestConnection connection,
            JobInfo jobInfo, String csvFileName) throws Exception {
        List<BatchInfo> batchInfos = new ArrayList<BatchInfo>();
        BufferedReader rdr = new BufferedReader(new InputStreamReader(
                new FileInputStream(csvFileName)));

        // read the CSV header row
        String hdr = rdr.readLine();
        byte[] headerBytes = (hdr + "\n").getBytes("UTF-8");
        int headerBytesLength = headerBytes.length;
//      I was making use of the following code which I replaced with the next line of code.
//      File tmpFile = File.createTempFile("bulkAPIInsert", ".csv");
        File tmpFile = new File("bulkAPIInsert.csv");
        // Split the CSV file into multiple batches
        try {
            FileOutputStream tmpOut = new FileOutputStream(tmpFile);
            int maxBytesPerBatch = 10000000; // 10 million bytes per batch
            int maxRowsPerBatch = 10000; // 10 thousand rows per batch
            int currentBytes = 0;
            int currentLines = 0;
            String nextLine;

            while ((nextLine = rdr.readLine()) != null) {
                byte[] bytes = (nextLine + "\n").getBytes("UTF-8"); //TODO
                if (currentBytes + bytes.length > maxBytesPerBatch
                        || currentLines > maxRowsPerBatch) {
                    createBatch(tmpOut, tmpFile, batchInfos, connection, jobInfo);
                    currentBytes = 0;
                    currentLines = 0;
                }
                if (currentBytes == 0) {
                    tmpOut = new FileOutputStream(tmpFile);
                    tmpOut.write(headerBytes);
                    currentBytes = headerBytesLength;
                    currentLines = 1;
                }
                tmpOut.write(bytes);
                currentBytes += bytes.length;
                currentLines++;
            }

            if (currentLines > 1) {
                createBatch(tmpOut, tmpFile, batchInfos, connection, jobInfo);
            }
        } finally {
            if(!tmpFile.delete())
                tmpFile.deleteOnExit();
            rdr.close();
        }
        return batchInfos;
    }

/**
     * Wait for a job to complete by polling the Bulk API.
     */
    Method2: private void awaitCompletion(RestConnection connection, JobInfo job,
            List<BatchInfo> batchInfoList) throws AsyncApiException { 
        try{
            /****
            Some code
            **/
                BatchInfo[] statusList = connection.getBatchInfoList(job.getId())
                .getBatchInfo();
                for (BatchInfo b : statusList) {
                    if (b.getState() == BatchStateEnum.Completed) {
                        if (incomplete.remove(b.getId())) 
                            //Do Something
                    }
                    else if(b.getState() == BatchStateEnum.Failed){ 

                        System.out.println("Reason: "+b.getStateMessage()+".\n  " +
                                "Number of Records Processed: "+b.getNumberRecordsProcessed());
                        throw (new Exception(""));
                    }
                }
            }
        }catch(Exception ex){log.debug(" Exception occurred.");}
    }

BatchInfo的getStateMessage（）方法提供了讨论的错误消息。

Answer 1

谢谢“猎犬”，帮助我。

<强>答案

通过删除每个单元格的换行符解决了该问题。

从CSV创建批次导致错误状态

1 个答案: