我正在使用Java SAX解析器从Excel(使用XSSF XLSX2CSV类)读取数据并将其加载到Greenplum数据库中。我正在使用以下链接中的代码:
我从上述代码中捕获了PrintStream输出,将其转换为ByteInputStream,然后使用本机批量加载实用程序-复制-命令将其加载到Postgres(Greenplum)中。
我在XLSX2CSV的主要方法中修改了以下内容,以捕获打印流并将其转换为字节输入流。
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos, true, "UTF-8");
// The package open is instantaneous, as it should be.
try (OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ)) {
XLSX2CSV xlsx2csv = new XLSX2CSV(p, ps, minColumns);
xlsx2csv.process();
System.out.println(ps);
String data = new String(baos.toByteArray(), StandardCharsets.UTF_8);
System.out.println(data);
byte[] bytes = data.getBytes("UTF8");
ByteArrayInputStream orinput = new ByteArrayInputStream(bytes);
String dbURL1 = "jdbc:postgresql://xxxxx:xxxxx/xxxxx";
String user = "xxxxxx";
String pass = "xxxxxx";
Connection GPConnection = DriverManager.getConnection(dbURL1, user, pass);
Statement GPsqlStatement = GPConnection.createStatement();
String GPStgTableTrunc = "truncate test_table";
GPsqlStatement.execute(GPStgTableTrunc);
System.out.print("Load to Greenplum starts "+
Calendar.getInstance().getTime() + "\r\n");
CopyManager copyManager = new CopyManager((BaseConnection) GPConnection);
copyManager.copyIn("copy test_table from stdin csv",orinput);
System.out.print("Load to Greenplum ends "+
Calendar.getInstance().getTime() + "\r\n");
但是,在转换为ByteInputStream的过程中,换行符似乎丢失了,并且在加载到Greenplum时出现以下错误。
ERROR: COPY metadata not found. This probably means that there is a mixture of newline types in the data. Use the NEWLINE keyword in order to resolve this reliably. (seg40 sdw6.gphd.local:1025 pid=101588
)
当我打印字符串'data'时,它似乎有换行符并且值已正确打印..但是,在将其批量装载到DB中时不会装载。
如何在上述情况下保留换行符,以便正确进行加载?或者,如果有一种方法可以将printsream转换为标准输入,那也可以。谢谢!
答案 0 :(得分:1)
尝试:“ \ r \ n”而不是“ \ n”
ByteArrayOutputStream output = new ByteArrayOutputStream();
output.write("something\r\n"".getBytes());
output.write("something\r\n"".getBytes());
ByteArrayOutputStream input = new ByteArrayInputStream(output.getBytes());
s3.putStream(input);
看起来与此类似:
ByteArrayOutputStream/InputStream losing newline characters on S3 Import
添加下面的PrintStream尝试过的示例代码:
static void printStream() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos, true, "UTF-8");
ps.println("test 1");
ps.println("test 2");
ps.println("test 3");
System.out.print(new String(baos.toByteArray()));
}
正在打印:
test 1
test 2
test 3