我有来自美国人口普查的CSV文件,如下所示:
"ZIP5","ZIP4","ZIP9","STATE CODE","STATE","COUNTY CODE","COUNTY NAME","CBSA CODE","CBSA TITLE","CBSA LSAD","METRO DIVISION CODE","METRO DIVISION TITLE","METRO DIVISION LSAD","CSA CODE","CSA TITLE","CSA LSAD"
"04841",,"04841","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04843",,"04843","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04846",,"04846","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04847",,"04847","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04848",,"04848","23","ME","027","Waldo County",,,,,,,,,
"04849",,"04849","23","ME","027","Waldo County",,,,,,,,,
"04850",,"04850","23","ME","027","Waldo County",,,,,,,,,
"04851",,"04851","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04852",,"04852","23","ME","015","Lincoln County",,,,,,,,,
该文件有超过200万条记录。大多数记录都没有所有字段中的数据。
以下是我为上述CSV文件定义的MySQL记录布局:
+----------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ZIP5 | varchar(5) | NO | | NULL | |
| ZIP4 | varchar(5) | NO | | NULL | |
| ZIP9 | varchar(10) | NO | | NULL | |
| STATE_CODE | varchar(2) | NO | | NULL | |
| STATE | varchar(2) | NO | | NULL | |
| COUNTY_CODE | varchar(3) | NO | | NULL | |
| COUNTY_NAME | varchar(50) | NO | | NULL | |
| CBSA_CODE | varchar(5) | NO | | NULL | |
| CBSA_TITLE | varchar(50) | NO | | NULL | |
| CBSA_LSAD | varchar(50) | NO | | NULL | |
| METRO_DIVISION_CODE | varchar(5) | NO | | NULL | |
| METRO_DIVISION_TITLE | varchar(50) | NO | | NULL | |
| METRO_DIVISION_LSAD | varchar(50) | NO | | NULL | |
| CSA_CODE | varchar(3) | NO | | NULL | |
| CSA_TITLE | varchar(50) | NO | | NULL | |
| CSA_LSAD | varchar(50) | NO | | NULL | |
+----------------------+------------------+------+-----+---------+----------------+
(我刚才意识到我应该将ZIP5定义为主键?)
我已经读过如果你在CSV文件中有一个空字段,你应该把它改成\ N,但有没有办法轻松地做到这一点?我可以编写一个PHP程序来执行此操作,但是有超过200万条记录需要很长时间,而我的服务器没有大量内存。
如何以最简单的方式成功将此CSV文件导入MySQL?在MySQL中的LOAD命令中是否有一些参数可以执行此操作?它现在的工作方式,它抱怨ZIP5有数据截断,当我查看MySQL时,它在邮政编码中只有前4位数字。谢谢!
答案 0 :(得分:1)
首先,我发现您在上面发布的表格上没有主键。首先必须始终有一个主键。通常我们使用AUTOINCREMENT添加一个名为id的列。对于Zip代码和东西,它也很方便描述2-3列的复杂键。一如既往地视情况而定。
至于进口。你有一些解决方案
在本地运行脚本以生成SQL插入语句,然后通过您可用的任何接口将数据提供给mysql服务器。
将CSV文件上传到服务器并使用命令行mysql导入CSV。 MySQL有一个内置的CSV导入器,虽然我从来不喜欢它;)
在服务器上运行脚本并一次添加一行。在PHP中,您可以逐行加载CSV并在每行上加载INSERT(请记住相应的set_time_limit和memory_limit)。 提醒一下,对于step3,如果您通过浏览器而不是通过命令行运行它,那么您的浏览器很可能会超时。通过脚本放心,在结束之前不会停止运行。
我认为我有一个CSV导入程序(用于巨大的CSV文件 - 如地理标记)。如果您需要,请告诉我,我也许可以找到它并在此发布。
不幸的是我找不到我的csv导入程序。但是看看php手册的fgetcsv的第一个条目,并进行了几次修改......
set_time_limit(3600); // 1 hour max script execution time. Adjust it according to your expectations.
if (($handle = fopen("test.csv", "r")) !== FALSE) {
// this will automate things but modify the csv head for each column to represent the actual column name in your table.
$header = fgetcsv($handle, 1000, ",");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$i = 0;
$values = array();
foreach($header as $key) {
if (!empty($data[$i])) {
$values[$key] = $data[$i];
}
}
$keys = "`" . implode("`, `", array_keys($values)) . "`";
$values = "'" . implode("', '", $values) . "'";
$statement = "INSERT INTO `table_name` ({$keys}) VALUES ({$values})";
// run the statement. the above is if you don't use PDO. For PDO transform accordingly. $values holds the column_name => value pairs. The values that can be null and should not be inserted you should give them default values in your mysql schema (table)
}
fclose($handle);
}
我希望这会有所帮助。没有测试过,但看起来不错;)
答案 1 :(得分:0)
更改文件路径后尝试以下LOAD命令,如果需要,请尝试行结束。
LOAD DATA INFILE 'your_file.csv' IGNORE
INTO TABLE zipcodes
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(ZIP5, ZIP4, ZIP9, STATE_CODE, STATE, COUNTY_CODE, COUNTY_NAME, CBSA_CODE,
CBSA_TITLE, CBSA_LSAD, METRO_DIVISION_CODE, METRO_DIVISION_TITLE,
METRO_DIVISION_LSAD, CSA_CODE, CSA_TITLE, CSA_LSAD);