Question

我正在使用Sqoop将MySQL表导入HDFS。为此，我使用自由格式查询导入。

--query "SELECT $query_select FROM $table where \$CONDITIONS"

由于min（id）和max（id）搜索，此查询非常慢。为了提高性能，我决定使用--boundary-query并指定手动下限和上限。（https://www.safaribooksonline.com/library/view/apache-sqoop-cookbook/9781449364618/ch04.html）：

--boundary-query "select 176862848, 172862848"

但是，sqoop并不关心指定的值，并再次尝试查找最小值和最大值＆＃34; id＆＃34;本身。

16/06/13 14:24:44 INFO tool.ImportTool: Lower bound value: 170581647
16/06/13 14:24:44 INFO tool.ImportTool: Upper bound value: 172909234

完整的sqoop命令：

sqoop-import -fs hdfs://xxxxxxxxx/ -D mapreduce.map.java.opts=" -Duser.timezone=Europe/Paris" -m $nodes_number\
    --connect jdbc:mysql://$server:$port/$database --username $username --password $password\
    --target-dir $destination_dir --boundary-query "select 176862848, 172862848"\
    --incremental append --check-column $id_column_name --last-value $last_value\
    --split-by $id_column_name --query "SELECT $query_select FROM $table where \$CONDITIONS"\
    --fields-terminated-by , --escaped-by \\ --enclosed-by '\"'

有没有人已经遇到/解决了这个问题？感谢

Answer 1

试试这个..

--boundary-query "select 176862848, 172862848 from tablename limit 1" \

Answer 2

我通过删除以下参数设法解决了这个问题：

--incremental append --check-column $id_column_name --last-value $last_value

似乎参数之间存在并发--boundary-query， - check-column， - split-by和--incremental append

Answer 3

你是对的..

我们不应将 - 拆分与 - boundary-query 控制参数一起使用。

设置常量边界查询

3 个答案: