我正在使用Sqoop作为一个非常基本的用例。我正在从MySQL视图中读取数据并将其写入HDFS上的文件。风景 当然没有索引,但它确实有一个整数列,可以在其上创建拆分。我在“--split-by”中给出了这一列 参数。但是当我运行以下命令时,sqoop不会创建任何拆分。
./sqoop import --num-mappers 16 --verbose --connect jdbc:mysql://hosts/database --username root --table "view_name" --boundary-query="select min(id), max(id) from tabel" --split-by "split_column" --target-dir=mydir
我尝试添加'--boundary-query'但没有影响。
我还尝试查看DataDrivenDBInputFormat类的代码,并使用mapred.map.tasks属性生成拆分,我试过了 并将其作为'-D mapred.map.tasks = 16'传递,但没有成功。 '-m 16'和'--num-mapper 16'也没有用。我错过了什么?
软件:
客户端上的Sqoop 1:sqoop-1.4.3.bin__hadoop-0.20 客户端上的Hadoop:hadoop-0.20.2-cdh3u6(HADOOP_MAPRED_HOME也设置为此。) 集群上的Hadoop:hadoop-2.0.0-cdh4.2.1
我在日志中看到以下内容,表明没有拆分(当然查询需要永久,有300米行。)
13/04/19 13:08:24 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1'
非常感谢。
>cam-1myusername-m:bin myusername$ ./sqoop import --num-mappers 16 --verbose --connect jdbc:mysql://cam-1myusername-m.local/mydb --username root --table "my_view" --boundary-query="select min(id), max(id) from mytable" --split-by "split_column" --target-dir=mydb_events_store
>13/04/20 17:35:14 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>13/04/20 17:35:14 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
>13/04/20 17:35:14 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
>13/04/20 17:35:14 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
>13/04/20 17:35:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
>13/04/20 17:35:14 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.MySQLManager@39385660
>13/04/20 17:35:14 INFO tool.CodeGenTool: Beginning code generation
>13/04/20 17:35:14 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
>13/04/20 17:35:14 DEBUG manager.SqlManager: Using fetchSize for next query: -2147483648
>13/04/20 17:35:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_view` AS t LIMIT 1
>13/04/20 17:35:14 DEBUG orm.ClassWriter: selected columns:
>13/04/20 17:35:14 DEBUG orm.ClassWriter: split_column
>13/04/20 17:35:14 DEBUG orm.ClassWriter: namespaceid
>13/04/20 17:35:14 DEBUG orm.ClassWriter: profileid
>13/04/20 17:35:14 DEBUG orm.ClassWriter: itemid
>13/04/20 17:35:14 DEBUG orm.ClassWriter: itemtype
>13/04/20 17:35:14 DEBUG orm.ClassWriter: type
>13/04/20 17:35:14 DEBUG orm.ClassWriter: submittime
>13/04/20 17:35:14 DEBUG orm.ClassWriter: eventtime
>13/04/20 17:35:14 DEBUG manager.SqlManager: Using fetchSize for next query: -2147483648
>13/04/20 17:35:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_view` AS t LIMIT 1
>13/04/20 17:35:14 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.java
>13/04/20 17:35:14 DEBUG orm.ClassWriter: Table name: my_view
>13/04/20 17:35:14 DEBUG orm.ClassWriter: Columns: split_column:4, namespaceid:12, profileid:4, itemid:4, itemtype:12, type:12, submittime:-5, eventtime:-5,
>13/04/20 17:35:14 DEBUG orm.ClassWriter: sourceFilename is my_view.java
>13/04/20 17:35:14 DEBUG orm.CompilationManager: Found existing /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/
>13/04/20 17:35:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/myusername/Documents/RandomAppStuff/cloudera/hadoop-0.20.2-cdh3u6
>13/04/20 17:35:14 DEBUG orm.CompilationManager: Adding source file: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.java
>...
>Note: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.java uses or overrides a deprecated API.
>Note: Recompile with -Xlint:deprecation for details.
>13/04/20 17:35:14 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar
>13/04/20 17:35:14 DEBUG orm.CompilationManager: Scanning for .class files in directory: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912
>13/04/20 17:35:14 DEBUG orm.CompilationManager: Got classfile: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.class -> my_view.class
>13/04/20 17:35:14 DEBUG orm.CompilationManager: Finished writing jar file /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar
>13/04/20 17:35:14 WARN manager.MySQLManager: It looks like you are importing from mysql.
>13/04/20 17:35:14 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
>13/04/20 17:35:14 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
>13/04/20 17:35:14 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
>13/04/20 17:35:15 DEBUG manager.MySQLManager: Rewriting connect string to jdbc:mysql://cam-1myusername-m.local/mydb?zeroDateTimeBehavior=convertToNull
>13/04/20 17:35:15 INFO mapreduce.ImportJobBase: Beginning import of my_view
>13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Checking for existing class: my_view
>13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Attempting to load jar through URL: jar:file:///tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar!/
>13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Previous classloader is sun.misc.Launcher$AppClassLoader@1feed786
>13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Testing class in jar: my_view
>13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Loaded jar into current JVM: jar:file:///tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar!/
>13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Added classloader for jar /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar: java.net.FactoryURLClassLoader@737c2891
2013-04-20 17:35:15.022 java[90158:1903] Unable to load realm mapping info from SCDynamicStore
>13/04/20 17:35:15 WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: select min(id), max(id) from mytable; splits may not partition data.
>13/04/20 17:35:15 DEBUG mapreduce.DataDrivenImportJob: Using table class: my_view
>13/04/20 17:35:15 DEBUG mapreduce.DataDrivenImportJob: Using InputFormat: class com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat
>13/04/20 17:35:15 DEBUG mapreduce.JobBase: Adding to job classpath: file:/Users/myusername/Documents/RandomAppStuff/sqoop-1.4.3.bin__hadoop-0.20/sqoop-1.4.3.jar
>...
>13/04/20 17:35:15 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
>13/04/20 17:35:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>13/04/20 17:35:15 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: select min(id), max(id) from mytable
>13/04/20 17:35:15 DEBUG db.IntegerSplitter: Splits: [ 1 to 417,282,940] into 1 parts
>13/04/20 17:35:15 DEBUG db.IntegerSplitter: 1
>13/04/20 17:35:15 DEBUG db.IntegerSplitter: 417,282,940
>13/04/20 17:35:15 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '`split_column` >= 1' and upper bound '`split_column` <= 417282940'
>...
>13/04/20 17:35:16 INFO mapred.JobClient: Running job: job_local_0001
>13/04/20 17:35:16 INFO mapred.LocalJobRunner: Waiting for map tasks
>13/04/20 17:35:16 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000000_0
>13/04/20 17:35:16 INFO mapred.Task: Using ResourceCalculatorPlugin : null
>13/04/20 17:35:16 INFO mapred.MapTask: Processing split: com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat$DataDrivenDBInputSplit@7953113d
>13/04/20 17:35:16 DEBUG db.DataDrivenDBInputFormat: Creating db record reader for db product: MYSQL
>13/04/20 17:35:16 DEBUG db.DataDrivenDBRecordReader: Using query: SELECT `split_column`, `namespaceid`, `profileid`, `itemid`, `itemtype`, `type`, `submittime`, `eventtime` FROM `my_view` AS `my_view` WHERE ( `split_column` >= 1 ) AND ( `split_column` <= 417282940 )
>13/04/20 17:35:16 DEBUG db.DBRecordReader: Using fetchSize for next query: -2147483648
>13/04/20 17:35:16 DEBUG db.DBRecordReader: Executing query: SELECT `split_column`, `namespaceid`, `profileid`, `itemid`, `itemtype`, `type`, `submittime`, `eventtime` FROM `my_view` AS `my_view` WHERE ( `split_column` >= 1 ) AND ( `split_column` <= 417282940 )
>13/04/20 17:35:17 INFO mapred.JobClient: map 0% reduce 0%
>13/04/20 17:35:22 INFO mapred.LocalJobRunner: