在将打印流从XSSF XLSX2CSV类转换为ByteInputStream时,如何保留换行符?

时间:2018-11-06 04:25:18

标签: java apache-poi sax greenplum xssf

我正在使用Java SAX解析器从Excel(使用XSSF XLSX2CSV类)读取数据并将其加载到Greenplum数据库中。我正在使用以下链接中的代码:

http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java

我从上述代码中捕获了PrintStream输出,将其转换为ByteInputStream,然后使用本机批量加载实用程序-复制-命令将其加载到Postgres(Greenplum)中。

我在XLSX2CSV的主要方法中修改了以下内容,以捕获打印流并将其转换为字节输入流。

ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos, true, "UTF-8");

// The package open is instantaneous, as it should be.
try (OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ)) {
XLSX2CSV xlsx2csv = new XLSX2CSV(p, ps, minColumns);
xlsx2csv.process();
System.out.println(ps);
String data = new String(baos.toByteArray(), StandardCharsets.UTF_8);
System.out.println(data);
byte[] bytes = data.getBytes("UTF8");
ByteArrayInputStream orinput = new ByteArrayInputStream(bytes);
String dbURL1 = "jdbc:postgresql://xxxxx:xxxxx/xxxxx";
String user = "xxxxxx";
String pass = "xxxxxx";
Connection GPConnection = DriverManager.getConnection(dbURL1, user, pass);

 Statement  GPsqlStatement = GPConnection.createStatement();
 String GPStgTableTrunc = "truncate test_table";
 GPsqlStatement.execute(GPStgTableTrunc);
 System.out.print("Load to Greenplum starts "+ 
  Calendar.getInstance().getTime() + "\r\n");

 CopyManager copyManager = new CopyManager((BaseConnection) GPConnection);
copyManager.copyIn("copy test_table from stdin csv",orinput);
System.out.print("Load to Greenplum ends "+ 
Calendar.getInstance().getTime() + "\r\n");

但是,在转换为ByteInputStream的过程中,换行符似乎丢失了,并且在加载到Greenplum时出现以下错误。

ERROR: COPY metadata not found. This probably means that there is a mixture of newline types in the data. Use the NEWLINE keyword in order to resolve this reliably. (seg40 sdw6.gphd.local:1025 pid=101588

当我打印字符串'data'时,它似乎有换行符并且值已正确打印..但是,在将其批量装载到DB中时不会装载。

如何在上述情况下保留换行符,以便正确进行加载?或者,如果有一种方法可以将printsream转换为标准输入,那也可以。谢谢!

1 个答案:

答案 0 :(得分:1)

尝试:“ \ r \ n”而不是“ \ n”

ByteArrayOutputStream output = new ByteArrayOutputStream();
output.write("something\r\n"".getBytes());
output.write("something\r\n"".getBytes());

ByteArrayOutputStream input = new ByteArrayInputStream(output.getBytes());
s3.putStream(input);

看起来与此类似:

ByteArrayOutputStream/InputStream losing newline characters on S3 Import

添加下面的PrintStream尝试过的示例代码:

static void printStream() throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PrintStream ps = new PrintStream(baos, true, "UTF-8");
        ps.println("test 1");
        ps.println("test 2");
        ps.println("test 3");
        System.out.print(new String(baos.toByteArray()));
    }

正在打印:

test 1
test 2
test 3