Sqoop:使用SQL函数通过--split-导入

时间:2018-07-06 21:14:28

标签: mysql hadoop sqoop

很抱歉,如果您早些时候已经回答了这个问题,但是我在Stack Overflow上找不到它。

我的源Mysql表具有PK作为Varchar,并且在导入时会造成重复,这太不好了,我不想使用-m 1,因为每个表大约有50GB,所以我尝试提供按选项拆分在我知道的列上被定义为varchar,但它的INT如下所示

我在EMR 5.14.0上为1.4.7版

sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://host/jslice \
--username=*** --password *** --table orders --fields-terminated-by '|' \
--lines-terminated-by '\n' --null-non-string "\\\\N" --null-string 
"\\\\N" --escaped-by '\' \
--optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String 
--hive-drop-import-delims \
--as-parquetfile -m 16 --compress --compression-codec 
org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \
--target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by 
'cast(order_number as UNSIGNED)'

内部sqoop将边界查询构建为

INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT 
MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as 
UNSIGNED)`) FROM `archive_orders`

并引发错误

ERROR tool.ImportTool: Encountered IOException running import job: 
java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) 
Unknown column 'cast(order_number as UNSIGNED)' in 'field list'

我看过一些帖子,说我们可以在拆分时传递sql函数,但我想确定它是否真的有效

请注意,我也尝试过使用“”和带有反斜线的强制转换命令

https://community.hortonworks.com/questions/146261/sql-function-in-split-by.html

https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-split-by-date-wants-to-compare-a-timestamp-with/m-p/69668#M3159

0 个答案:

没有答案