我正在尝试将文件(大小~1GB)中的记录导出到mysql表。我有像
这样的记录1295517525,1,047bb3357bb557358caace5e9206aa8e,,2016-04-14 10:13:37,2016-04-15 05:17:30,5289170
1295517526,112,2dZV5C_yI4Fo4Hy219Kgd_ix8eYLCn7YBMkm20IFpRjw,,2016-04-14 10:13:34,2016-04-15 05:17:30,NULL
1295517527,112,2QuH7OknQNlwfUHXtxDLW_M_HwhbieHMklPoZiqPIM68,,2016-04-14 10:13:34,2016-04-15 05:17:30,NULL
1295517528,113,570219b22319abb3,,2016-04-14 10:13:32,2016-04-15 05:17:30,NULL
1295517529,112,2eRpm06w2O5FbZJNU06Pz-teAl5dz_XmSrPwKnl9oZx4,,2016-04-14 10:13:37,2016-04-15 05:17:30,NULL
1295517530,113,570f6d4f1a571d77,,2016-04-14 10:13:37,2016-04-15 05:17:30,NULL
1295517531,112,2L7RPFdtGIhXG3uWh6Z8-z7PWXmjvXJMbcA0x2ocKtyg,,2016-04-14 10:13:34,2016-04-15 05:17:30,NULL
1295517532,113,570f6d504e7467f3,,2016-04-14 10:13:37,2016-04-15 05:17:30,NULL
1295517533,112,"""3tLMPU9Ii7ObvATtF1j6d8Hnla-15n5zMcWTDUKgC54",,2016-04-14 10:13:32,2016-04-15 05:17:30,NULL
1295517534,113,570f6d5076785dbf,,2016-04-14 10:13:37,2016-04-15 05:17:30,NULL
1295517535,113,5706b1f49b3848ab,,2016-04-14 10:13:38,2016-04-15 05:17:30,NULL
1295517536,113,55052554de4069e1,,2016-04-14 10:13:37,2016-04-15 05:17:30,NULL
1295517537,113,570f6d504529faff,,2016-04-14 10:13:37,2016-04-15 05:17:30,NULL
1295517538,113,570f590151481cbd,,2016-04-14 10:13:32,2016-04-15 05:17:30,NULL
很少有记录被“”包围,导致导致失败,原因如下
Parser.java:108)
at org.apache.sqoop.lib.RecordParser.parseRecord(RecordParser.java:125)
at nodes.parse(nodes.java:387)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: On input: 1295517533,112,"""3tLMPU9Ii7ObvATtF1j6d8Hnla-15n5zMcWTDUKgC54",,2016-04-14 10:13:32,2016-04-15 05:17:30,NULL
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: On input file: hdfs://localhost:9000/lagvankarh/input/xes
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: At position 554748510
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: Currently processing split:
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: Paths:/lagvankarh/input/xes:536870912+134217728,/lagvankarh/input/xes:671088640+134217728
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: This issue might not necessarily be caused by current input
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: due to the batching nature of export.
2016-06-08 11:23:12,062 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2016-06-08 11:23:12,062 INFO [Thread-10] org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2016-06-08 11:23:12,081 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.cloudera.sqoop.lib.RecordParser$ParseError: Expected delimiter at position 17
at org.apache.sqoop.lib.RecordParser.parseRecord(RecordParser.java:319)
at org.apache.sqoop.lib.RecordParser.parseRecord(RecordParser.java:108)
at org.apache.sqoop.lib.RecordParser.parseRecord(RecordParser.java:125)
at nodes.parse(nodes.java:387)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
我的SQOOP命令是
sqoop export --connect jdbc:mysql://localhost/graph1 --username root --password password --export-dir /lagvankarh/input/xes --table nodes --input-null-string NULL --input-null-non-string NULL --columns "id,type,name,postcode,updated,db_updated,useragent" --update-key id --input-optionally-enclosed-by '"' -m 1
如何处理“?
附带的记录答案 0 :(得分:0)
此记录导致错误
"""3tLMPU9Ii7ObvATtF1j6d8Hnla-15n5zMcWTDUKgC54"
问题是解析器在第二次引用后期望一个分隔符。如果此字段在其数据中不包含分隔符,则可以运行不带此参数的命令
--input-optionally-enclosed-by '"'