我有一个类似于:
的数据文件data1~|~data2~|~data3~!~data1~|~data2~|~data3~!~data1~|~data2~|~data3~!~data1~|~data2~|~data3~!~
字段由~|~
(代字号/管道/代字号)分隔。行/记录由~!~
分隔。
目标是将其按到X12格式的文件中。我将拥有许多文件和许多记录......每个数千个......我刚刚启动项目,所以我正在探索解决方案。
我已经在python中做了一些编码,所以也许这会起作用但不确定。
我正在寻找可能包含要开始使用的内容的建议或库。
答案 0 :(得分:1)
Error: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:64)
at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:133)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1199)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1092)
at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1442)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:275)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:528)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
结果是:
data = "data1~|~data2~|~data3~!~data1~|~data2~|~data3~!~data1~|~data2~|~data3~!~data1~|~data2~|~data3~!~"
## parse rows
rows = data.split("~!~")
## Parse Columns
final = [x.split("~|~") for x in rows]
print(final)
下一步将转换为XML。
然后,一旦将其转换为XML,请使用: https://pypi.python.org/pypi/pyx12/2.1.1将其转换为X12
然后就像魔术一样,你应该完成!