Question

很抱歉，如果您早些时候已经回答了这个问题，但是我在Stack Overflow上找不到它。

我的源Mysql表具有PK作为Varchar，并且在导入时会造成重复，这太不好了，我不想使用-m 1，因为每个表大约有50GB，所以我尝试提供按选项拆分在我知道的列上被定义为varchar，但它的INT如下所示

我在EMR 5.14.0上为1.4.7版

sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://host/jslice \
--username=*** --password *** --table orders --fields-terminated-by '|' \
--lines-terminated-by '\n' --null-non-string "\\\\N" --null-string 
"\\\\N" --escaped-by '\' \
--optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String 
--hive-drop-import-delims \
--as-parquetfile -m 16 --compress --compression-codec 
org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \
--target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by 
'cast(order_number as UNSIGNED)'

内部sqoop将边界查询构建为

INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT 
MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as 
UNSIGNED)`) FROM `archive_orders`

并引发错误

ERROR tool.ImportTool: Encountered IOException running import job: 
java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) 
Unknown column 'cast(order_number as UNSIGNED)' in 'field list'

我看过一些帖子，说我们可以在拆分时传递sql函数，但我想确定它是否真的有效

请注意，我也尝试过使用“”和带有反斜线的强制转换命令

https://community.hortonworks.com/questions/146261/sql-function-in-split-by.html

https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-split-by-date-wants-to-compare-a-timestamp-with/m-p/69668#M3159

Sqoop：使用SQL函数通过--split-导入

0 个答案: