如何使用sqoop将hdfs中的json数据插入mysql?

时间:2017-02-18 14:41:05

标签: hadoop sqoop

我已经将JSON数据加载到我的HDFS,我在MySQL数据库中创建了包含所需列的表,如下所示。

如何使用行格式化程序创建用于接受JSON的表?

我的HDFS数据

Inspect

我的SQL表格结构

{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani@gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani@gmail.com"
},
{
"userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks",
"employeeCode":"E3",
"region":"CA",
"phoneNumber":"408-2222222",
"emailAddress":"tomhanks@gmail.com"
}
]
}

我正在尝试使用sqoop导出将数据从我的HDFS加载到MySQL,如下所示

mysql> create table employee(userid int,jobTitleName varchar(20),firstName varchar(20),lastName varchar(20),preferrredFullName varchar(20),employeeCode varchar(20),region varchar(20),phoneNumber varchar(20), emailAddress varchar(20),modifiedDate timestamp  DEFAULT CURRENT_TIMESTAMP);
mysql> desc employee;
+--------------------+-------------+------+-----+-------------------+-------+
| Field              | Type        | Null | Key | Default           | Extra |
+--------------------+-------------+------+-----+-------------------+-------+
| userid             | int(11)     | YES  |     | NULL              |       |
| jobTitleName       | varchar(20) | YES  |     | NULL              |       |
| firstName          | varchar(20) | YES  |     | NULL              |       |
| lastName           | varchar(20) | YES  |     | NULL              |       |
| preferrredFullName | varchar(20) | YES  |     | NULL              |       |
| employeeCode       | varchar(20) | YES  |     | NULL              |       |
| region             | varchar(20) | YES  |     | NULL              |       |
| phoneNumber        | varchar(20) | YES  |     | NULL              |       |
| emailAddress       | varchar(20) | YES  |     | NULL              |       |
| modifiedDate       | timestamp   | NO   |     | CURRENT_TIMESTAMP |       |
+--------------------+-------------+------+-----+-------------------+-------+
10 rows in set (0.00 sec)

最终会出现以下异常

sqoop export --connect jdbc:mysql://localhost/emp_scheme --username root --password adithyan --table employee --export-dir /user/adithyan/filesystem/employee.txt

有人可以帮我吗?

1 个答案:

答案 0 :(得分:-1)

您可能需要查看多个选项.. JSON_SET / REPLACE / INSERT - sqoop可能还没有直接支持这些选项。

另一个选择是使用pig预处理数据,然后在sqooping到RDBMS之前在HDFS中暂存数据。