Sqoop(import)不分割从MySQL视图/表中读取的数据

时间:2013-04-19 19:03:02

标签: mysql import split sqoop

我正在使用Sqoop作为一个非常基本的用例。我正在从MySQL视图中读取数据并将其写入HDFS上的文件。风景 当然没有索引,但它确实有一个整数列,可以在其上创建拆分。我在“--split-by”中给出了这一列 参数。但是当我运行以下命令时,sqoop不会创建任何拆分。

./sqoop import --num-mappers 16 --verbose --connect jdbc:mysql://hosts/database --username root --table "view_name" --boundary-query="select min(id), max(id) from tabel" --split-by "split_column" --target-dir=mydir

我尝试添加'--boundary-query'但没有影响。

我还尝试查看DataDrivenDBInputFormat类的代码,并使用mapred.map.tasks属性生成拆分,我试过了 并将其作为'-D mapred.map.tasks = 16'传递,但没有成功。 '-m 16'和'--num-mapper 16'也没有用。我错过了什么?

软件:

客户端上的Sqoop 1:sqoop-1.4.3.bin__hadoop-0.20 客户端上的Hadoop:hadoop-0.20.2-cdh3u6(HADOOP_MAPRED_HOME也设置为此。) 集群上的Hadoop:hadoop-2.0.0-cdh4.2.1

我在日志中看到以下内容,表明没有拆分(当然查询需要永久,有300米行。)

13/04/19 13:08:24 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1'

非常感谢。

            >cam-1myusername-m:bin myusername$ ./sqoop  import  --num-mappers 16 --verbose --connect jdbc:mysql://cam-1myusername-m.local/mydb --username root  --table "my_view" --boundary-query="select min(id), max(id) from mytable"  --split-by "split_column"  --target-dir=mydb_events_store 
            >13/04/20 17:35:14 DEBUG tool.BaseSqoopTool: Enabled debug logging.
            >13/04/20 17:35:14 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
            >13/04/20 17:35:14 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
            >13/04/20 17:35:14 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
            >13/04/20 17:35:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
            >13/04/20 17:35:14 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.MySQLManager@39385660
            >13/04/20 17:35:14 INFO tool.CodeGenTool: Beginning code generation
            >13/04/20 17:35:14 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
            >13/04/20 17:35:14 DEBUG manager.SqlManager: Using fetchSize for next query: -2147483648
            >13/04/20 17:35:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_view` AS t LIMIT 1
            >13/04/20 17:35:14 DEBUG orm.ClassWriter: selected columns:
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   split_column
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   namespaceid
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   profileid
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   itemid
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   itemtype
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   type
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   submittime
            >13/04/20 17:35:14 DEBUG orm.ClassWriter:   eventtime
            >13/04/20 17:35:14 DEBUG manager.SqlManager: Using fetchSize for next query: -2147483648
            >13/04/20 17:35:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `my_view` AS t LIMIT 1
            >13/04/20 17:35:14 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.java
            >13/04/20 17:35:14 DEBUG orm.ClassWriter: Table name: my_view
            >13/04/20 17:35:14 DEBUG orm.ClassWriter: Columns: split_column:4, namespaceid:12, profileid:4, itemid:4, itemtype:12, type:12, submittime:-5, eventtime:-5, 
            >13/04/20 17:35:14 DEBUG orm.ClassWriter: sourceFilename is my_view.java
            >13/04/20 17:35:14 DEBUG orm.CompilationManager: Found existing /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/
            >13/04/20 17:35:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/myusername/Documents/RandomAppStuff/cloudera/hadoop-0.20.2-cdh3u6
            >13/04/20 17:35:14 DEBUG orm.CompilationManager: Adding source file: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.java
            >...
            >Note: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.java uses or overrides a deprecated API.
            >Note: Recompile with -Xlint:deprecation for details.
            >13/04/20 17:35:14 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar
            >13/04/20 17:35:14 DEBUG orm.CompilationManager: Scanning for .class files in directory: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912
            >13/04/20 17:35:14 DEBUG orm.CompilationManager: Got classfile: /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.class -> my_view.class
            >13/04/20 17:35:14 DEBUG orm.CompilationManager: Finished writing jar file /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar
            >13/04/20 17:35:14 WARN manager.MySQLManager: It looks like you are importing from mysql.
            >13/04/20 17:35:14 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
            >13/04/20 17:35:14 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
            >13/04/20 17:35:14 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
            >13/04/20 17:35:15 DEBUG manager.MySQLManager: Rewriting connect string to jdbc:mysql://cam-1myusername-m.local/mydb?zeroDateTimeBehavior=convertToNull
            >13/04/20 17:35:15 INFO mapreduce.ImportJobBase: Beginning import of my_view
            >13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Checking for existing class: my_view
            >13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Attempting to load jar through URL: jar:file:///tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar!/
            >13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Previous classloader is sun.misc.Launcher$AppClassLoader@1feed786
            >13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Testing class in jar: my_view
            >13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Loaded jar into current JVM: jar:file:///tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar!/
            >13/04/20 17:35:15 DEBUG util.ClassLoaderStack: Added classloader for jar /tmp/sqoop-myusername/compile/494e66dd1b672a3340f067275bae8912/my_view.jar: java.net.FactoryURLClassLoader@737c2891
            2013-04-20 17:35:15.022 java[90158:1903] Unable to load realm mapping info from SCDynamicStore
            >13/04/20 17:35:15 WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: select min(id), max(id) from mytable; splits may not partition data.
            >13/04/20 17:35:15 DEBUG mapreduce.DataDrivenImportJob: Using table class: my_view
            >13/04/20 17:35:15 DEBUG mapreduce.DataDrivenImportJob: Using InputFormat: class com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat
            >13/04/20 17:35:15 DEBUG mapreduce.JobBase: Adding to job classpath: file:/Users/myusername/Documents/RandomAppStuff/sqoop-1.4.3.bin__hadoop-0.20/sqoop-1.4.3.jar
            >...
            >13/04/20 17:35:15 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
            >13/04/20 17:35:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
            >13/04/20 17:35:15 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: select min(id), max(id) from mytable
            >13/04/20 17:35:15 DEBUG db.IntegerSplitter: Splits: [                           1 to                  417,282,940] into 1 parts
            >13/04/20 17:35:15 DEBUG db.IntegerSplitter:                            1
            >13/04/20 17:35:15 DEBUG db.IntegerSplitter:                  417,282,940
            >13/04/20 17:35:15 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '`split_column` >= 1' and upper bound '`split_column` <= 417282940'
            >...
            >13/04/20 17:35:16 INFO mapred.JobClient: Running job: job_local_0001
            >13/04/20 17:35:16 INFO mapred.LocalJobRunner: Waiting for map tasks
            >13/04/20 17:35:16 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000000_0
            >13/04/20 17:35:16 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
            >13/04/20 17:35:16 INFO mapred.MapTask: Processing split: com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat$DataDrivenDBInputSplit@7953113d
            >13/04/20 17:35:16 DEBUG db.DataDrivenDBInputFormat: Creating db record reader for db product: MYSQL
            >13/04/20 17:35:16 DEBUG db.DataDrivenDBRecordReader: Using query: SELECT `split_column`, `namespaceid`, `profileid`, `itemid`, `itemtype`, `type`, `submittime`, `eventtime` FROM `my_view` AS `my_view` WHERE ( `split_column` >= 1 ) AND ( `split_column` <= 417282940 )
            >13/04/20 17:35:16 DEBUG db.DBRecordReader: Using fetchSize for next query: -2147483648
            >13/04/20 17:35:16 DEBUG db.DBRecordReader: Executing query: SELECT `split_column`, `namespaceid`, `profileid`, `itemid`, `itemtype`, `type`, `submittime`, `eventtime` FROM `my_view` AS `my_view` WHERE ( `split_column` >= 1 ) AND ( `split_column` <= 417282940 )
            >13/04/20 17:35:17 INFO mapred.JobClient:  map 0% reduce 0%
            >13/04/20 17:35:22 INFO mapred.LocalJobRunner: 

0 个答案:

没有答案