很抱歉,如果您早些时候已经回答了这个问题,但是我在Stack Overflow上找不到它。
我的源Mysql表具有PK作为Varchar,并且在导入时会造成重复,这太不好了,我不想使用-m 1,因为每个表大约有50GB,所以我尝试提供按选项拆分在我知道的列上被定义为varchar,但它的INT如下所示
我在EMR 5.14.0上为1.4.7版
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://host/jslice \
--username=*** --password *** --table orders --fields-terminated-by '|' \
--lines-terminated-by '\n' --null-non-string "\\\\N" --null-string
"\\\\N" --escaped-by '\' \
--optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String
--hive-drop-import-delims \
--as-parquetfile -m 16 --compress --compression-codec
org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \
--target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by
'cast(order_number as UNSIGNED)'
内部sqoop将边界查询构建为
INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT
MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as
UNSIGNED)`) FROM `archive_orders`
并引发错误
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029)
Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
我看过一些帖子,说我们可以在拆分时传递sql函数,但我想确定它是否真的有效
请注意,我也尝试过使用“”和带有反斜线的强制转换命令
https://community.hortonworks.com/questions/146261/sql-function-in-split-by.html