我在mysql中有一个~300GB的数据表。我想在表格中添加新列。当我使用alter命令更改表时,它永远不会完成并处理死亡。因此我计划编写SQOOP作业以从表中获取所有数据并将其转储到HDFS中,然后在mysql中创建新数据库并创建带有附加列的表并再次从HDFS重新导入数据。
table structure:
CREATE TABLE `nodes` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`type` int(11) NOT NULL,
`name` varchar(50) NOT NULL,
`postcode` varchar(4) DEFAULT NULL,
`updated` datetime DEFAULT NULL,
`db_updated` datetime DEFAULT CURRENT_TIMESTAMP,
`useragent` bigint(20) DEFAULT NULL,
`last_seen` date DEFAULT NULL, --newly added column
PRIMARY KEY (`id`),
UNIQUE KEY `akaid_index` (`type`,`name`),
KEY `useragent_idx` (`useragent`),
KEY `type` (`type`),
CONSTRAINT `useragentfk` FOREIGN KEY (`useragent`) REFERENCES `useragents` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1091725696 DEFAULT CHARSET=latin1;
SQOOP命令:
sqoop export --connect jdbc:mysql://localhost:3306/graph1 --table nodes --username root --password password --export-dir <dir-path> --input-fields-terminated-by ','
我收到以下错误:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: 'NULL'
at nodes.__loadFromFields(nodes.java:470)
at nodes.parse(nodes.java:388)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: "NULL"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.valueOf(Long.java:540)
at nodes.__loadFromFields(nodes.java:467)
... 12 more
HDFS中的文件包含以下记录:
1289603991,1,fee1cee723bdb0bc499c443765b40e3d,,2016-04-13 10:19:59,2016-04-14 03:44:55,5296252
1289603992,1,edf65c2e7b89388fe9068cc3a898a3fd,,2016-04-13 10:20:00,2016-04-14 03:44:55,5411481
1289603993,1,5760fd1cca92a65ce6f2db43853fc118,,2016-04-13 10:19:59,2016-04-14 03:44:55,4441745
1289603994,1,65dd92c80df5581f55bc60f3e997ec05,,2016-04-13 10:19:59,2016-04-14 03:44:55,5332084
1289603995,1,7654a84428f3064828f5972cfce5f8e6,,2016-04-13 10:20:00,2016-04-14 03:44:55,5202243
1289603996,1,84c270212fe5f3a52cb2bd75403da058,,2016-04-13 10:20:00,2016-04-14 03:44:55,5398729
1289603997,1,a486382c4fc296a5e8d3c0491568c22c,,2016-04-13 10:19:57,2016-04-14 03:44:55,5289170
1289603998,112,2_3Nns7YXPmS_xv3imJBiw04BQf1sNc2tJrtFJ5TCx98,,2016-04-13 10:20:00,2016-04-14 03:44:55,NULL
1289603999,1,a3607df77e025b12c1728f62589857fa,,2016-04-13 10:19:59,2016-04-14 03:44:55,12
1289604000,113,570e1d4e6372cd9c,,2016-04-13 10:19:59,2016-04-14 03:44:55,NULL
1289604001,113,57023dd016258fbf,,2016-04-13 10:20:00,2016-04-14 03:44:55,NULL
添加--input-null-string和--input-null-non-string sqoop作业后,但现在因以下原因而失败:
2016-06-07 18:11:37,750 ERROR [Thread-9] org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update thread: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '1289603991' for key 'PRIMARY'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
at com.mysql.jdbc.Util.getInstance(Util.java:387)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3966)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3902)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2526)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2673)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1192)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
由于表格有AUTO_INCREMENT列,因此我遇到了错误。有没有其他方法来改变这个表并使其快速?根据这个线程(Changing Large MySQL InnoDB Tables),似乎大型的InnoDB表需要更多的时间甚至几天。
如果您可以提出任何替代建议,那就很好。
答案 0 :(得分:1)
您的数据违反了主键约束:
如果您的表有约束(例如,主键列的约束) 值必须是唯一的)并且已包含数据,您必须小心 避免插入违反这些限制的记录。
您需要在执行导出命令之前截断表,或使用以下参数更新现有数据:
cats = require(['./cats.js']);
console.log(cats);