使用加载数据将CSV导入MySQL

时间:2017-03-25 13:42:37

标签: mysql csv insert load

我希望使用MySQL loadfile将CSV的内容插入到我的表中,但是,每次使用以下命令执行此操作时,都会删除多行

LOAD DATA INFILE 'new.csv' INTO TABLE Example
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

我希望插入的CSV数据是Companies House(http://download.companieshouse.gov.uk/en_output.html)提供的免费公司产品数据

我非常感谢任何帮助。 谢谢!

2 个答案:

答案 0 :(得分:3)

根据提供的数据规范here

,提供此表定义
create table companieshouse
(
CompanyName varchar(160),
CompanyNumber varchar(8),
RegAddressCareOf varchar(100),
RegAddressPOBox varchar(10),
RegAddressAddressLine1 varchar(300),
RegAddressAddressLine2 varchar(300),
RegAddressPostTown varchar(50),
RegAddressCounty varchar(60),
RegAddressCountry varchar(50),
RegAddressPostCode varchar(20),
CompanyCategory varchar(100),
CompanyStatus varchar(70),
CountryOfOrigin varchar(50),
DissolutionDate date,
IncorporationDate date,
AccountsAccountRefDay integer,
AccountsAccountRefMonth integer,
AccountsNextDueDate date,
AccountsLastMadeUpDate date,
AccountsAccountCategory varchar(30),
ReturnsNextDueDate date,
ReturnsLastMadeUpDate date,
MortgagesNumMortCharges integer,
MortgagesNumMortOutstanding integer,
MortgagesNumMortPartSatisfied integer,
MortgagesNumMortSatisfied integer,
SICCodeSicText_1 varchar(170),
SICCodeSicText_2 varchar(170),
SICCodeSicText_3 varchar(170),
SICCodeSicText_4 varchar(170),
LimitedPartnershipsNumGenPartners integer,
LimitedPartnershipsNumLimPartners integer,
URI varchar(47),
PreviousName1CONDATE date,
PreviousName1CompanyName varchar(160),
PreviousName2CONDATE date,
PreviousName2CompanyName varchar(160),
PreviousName3CONDATE date,
PreviousName3CompanyName varchar(160),
PreviousName4CONDATE date,
PreviousName4CompanyName varchar(160),
PreviousName5CONDATE date,
PreviousName5CompanyName varchar(160),
PreviousName6CONDATE date, 
PreviousName6CompanyName varchar(160),
PreviousName7CONDATE date,
PreviousName7CompanyName varchar(160),
PreviousName8CONDATE date,
PreviousName8CompanyName varchar(160),
PreviousName9CONDATE date, 
PreviousName9CompanyName varchar(160),
PreviousName10CONDATE date, 
PreviousName10CompanyName varchar(160),
ConfStmtNextDueDate date, 
ConfStmtLastMadeUpDate date
);

这会将提供的.csv文件中的数据加载到表中;

LOAD DATA INFILE '/var/lib/mysql-files/BasicCompanyData-2017-03-06-part1_5.csv' 
INTO TABLE companieshouse 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
ESCAPED BY ""
LINES TERMINATED BY '\n'
 IGNORE 1 LINES
(
CompanyName,
CompanyNumber,
RegAddressCareOf,
RegAddressPOBox,
RegAddressAddressLine1,
RegAddressAddressLine2,
RegAddressPostTown,
RegAddressCounty,
RegAddressCountry,
RegAddressPostCode,
CompanyCategory,
CompanyStatus,
CountryOfOrigin,
@DissolutionDate,
@IncorporationDate,
@AccountsAccountRefDay,
@AccountsAccountRefMonth,
@AccountsNextDueDate,
@AccountsLastMadeUpDate,
AccountsAccountCategory,
@ReturnsNextDueDate,
@ReturnsLastMadeUpDate,
@MortgagesNumMortCharges,
@MortgagesNumMortOutstanding,
@MortgagesNumMortPartSatisfied,
@MortgagesNumMortSatisfied,
SICCodeSicText_1,
SICCodeSicText_2,
SICCodeSicText_3,
SICCodeSicText_4,
@LimitedPartnershipsNumGenPartners,
@LimitedPartnershipsNumLimPartners,
URI,
@PreviousName1CONDATE,
PreviousName1CompanyName,
@PreviousName2CONDATE,
PreviousName2CompanyName,
@PreviousName3CONDATE,
PreviousName3CompanyName,
@PreviousName4CONDATE,
PreviousName4CompanyName,
@PreviousName5CONDATE,
PreviousName5CompanyName,
@PreviousName6CONDATE, 
PreviousName6CompanyName,
@PreviousName7CONDATE,
PreviousName7CompanyName,
@PreviousName8CONDATE,
PreviousName8CompanyName,
@PreviousName9CONDATE, 
PreviousName9CompanyName,
@PreviousName10CONDATE, 
PreviousName10CompanyName,
@ConfStmtNextDueDate, 
@ConfStmtLastMadeUpDate)
SET DissolutionDate = IF(@DissolutionDate = '', NULL, STR_TO_DATE(@DissolutionDate, '%d/%m/%Y')),
IncorporationDate = IF(@IncorporationDate = '', NULL, STR_TO_DATE(@IncorporationDate, '%d/%m/%Y')),
AccountsNextDueDate = IF(@AccountsNextDueDate = '', NULL, STR_TO_DATE(@AccountsNextDueDate, '%d/%m/%Y')),
AccountsLastMadeUpDate = IF(@AccountsLastMadeUpDate = '', NULL, STR_TO_DATE(@AccountsLastMadeUpDate, '%d/%m/%Y')),
ReturnsNextDueDate = IF(@ReturnsNextDueDate = '', NULL,  STR_TO_DATE(@ReturnsNextDueDate, '%d/%m/%Y')),
ReturnsLastMadeUpDate = IF(@ReturnsLastMadeUpDate = '', NULL, STR_TO_DATE(@ReturnsLastMadeUpDate, '%d/%m/%Y')),
PreviousName1CONDATE = IF(@PreviousName1CONDATE = '', NULL, STR_TO_DATE(@PreviousName1CONDATE, '%d/%m/%Y')),
PreviousName2CONDATE = IF(@PreviousName2CONDATE = '', NULL, STR_TO_DATE(@PreviousName2CONDATE, '%d/%m/%Y')),
PreviousName3CONDATE = IF(@PreviousName3CONDATE = '', NULL, STR_TO_DATE(@PreviousName3CONDATE, '%d/%m/%Y')),
PreviousName4CONDATE = IF(@PreviousName4CONDATE = '', NULL, STR_TO_DATE(@PreviousName4CONDATE, '%d/%m/%Y')),
PreviousName5CONDATE = IF(@PreviousName5CONDATE = '', NULL, STR_TO_DATE(@PreviousName5CONDATE, '%d/%m/%Y')),
PreviousName6CONDATE = IF(@PreviousName6CONDATE = '', NULL, STR_TO_DATE(@PreviousName6CONDATE, '%d/%m/%Y')),
PreviousName7CONDATE = IF(@PreviousName7CONDATE = '', NULL, STR_TO_DATE(@PreviousName7CONDATE, '%d/%m/%Y')),
PreviousName8CONDATE = IF(@PreviousName8CONDATE = '', NULL, STR_TO_DATE(@PreviousName8CONDATE, '%d/%m/%Y')),
PreviousName9CONDATE = IF(@PreviousName9CONDATE = '', NULL, STR_TO_DATE(@PreviousName9CONDATE, '%d/%m/%Y')),
PreviousName10CONDATE = IF(@PreviousName10CONDATE = '', NULL, STR_TO_DATE(@PreviousName10CONDATE, '%d/%m/%Y')),
AccountsAccountRefDay = NULLIF(@AccountsAccountRefDay, ''),
AccountsAccountRefMonth = NULLIF(@AccountsAccountRefMonth, '') ,
MortgagesNumMortCharges = NULLIF(@MortgagesNumMortCharges, ''),
MortgagesNumMortOutstanding = NULLIF(@MortgagesNumMortOutstanding, ''),
MortgagesNumMortPartSatisfied = NULLIF(@MortgagesNumMortPartSatisfied, ''),
MortgagesNumMortSatisfied = NULLIF(@MortgagesNumMortSatisfied, ''),
LimitedPartnershipsNumGenPartners = NULLIF(@LimitedPartnershipsNumGenPartners, ''),
LimitedPartnershipsNumLimPartners = NULLIF(@LimitedPartnershipsNumLimPartners, '')
;

...加载数据时没有任何错误或警告:

Query OK, 849999 rows affected (19.43 sec)
Records: 849999  Deleted: 0  Skipped: 0  Warnings: 0

答案 1 :(得分:0)

从一方面 - LOAD DATA INFILE,禁食方法,但现实生活总是需要一些检查,转换和其他业务逻辑。

正常做法 - 使用ETL工具而不是直接导入

或多阶段流程 - 清理数据(检查,记录错误,转换,在第1步添加计算列等),而不是导入最终结果。

现在有许多优秀的OpenSource工具:

转换您的导入/导出逻辑您可以实现所有步骤,例如: