Question

我正在使用sqoop import将表Oracle导入到hdfs。

某些列/字段在其中包含字符管道（|）。

我正在使用--fields-terminated-by "|"分隔列/字段。

我不想在导入之前使用任何机制来处理数据。示例：替换函数等

如果可能的话，我想使用一个可解决此问题的sqoop参数。

表Oracle：

PLAB01@PLAB01> select * from teste;

   CP1   CP2        CP3                  CP4
   ---------- ---------- -------------------- ----------
    7   sete       sete|7               sete

我尝试导入：

sqoop import -Dmapred.job.queue.name=root.ingestao.ogg --connect "jdbc:oracle:thin:@10.10.10.10:1521/plab01"  
--username scott --password tiger \
--query "select a.* from scott.test a where \$CONDITIONS" \
--target-dir "/scott/db_ods/TEST_sqoop" \
--fields-terminated-by "|" -m 1 \
--lines-terminated-by "\n" \
--delete-target-dir \
--null-string '\\N' \
--null-non-string '\\N' \
--hive-drop-import-delims \
--mapreduce-job-name TEST \

hdfs文件中的结果为：

7|sete      |sete|7|sete      |

注意“ sete | 7”。这是oracle表中的原始数据。

在配置单元表中，它是用定界符char创建的。（管道）我可以看到CP3的分隔值（CP3文件中的数据“ sete”和CP4中的数据“ 7”，并且原始CP4发生了移位，因为配置单元表未引导5个字段/列。

我需要hdfs文件： 7 | sete | sete7 | sete->在'sete'和'7'之间没有管道

带有分隔符字符管道的Sqoop导入表Oracle和内部带有管道的字段

0 个答案: