我有500种这种格式的CSV文件:
IndicatorA_Name.csv
1900 1901 1902 ... Norway 3 2 Sweden 1 3 3 Denmark 5 2 3 ...
IndicatorB_Name.csv
1900 1901 1902 ... Norway 1 3 4 Sweden 1 2 Iceland 1 6 3 ...
我想浏览所有这些文件并制作一个具有以下结构的平面表(CSV文件):
country, year, IndicatorA_Name, IndicatorB_Name, ... Sweden, 1900, 1, 1 Sweden, 1901, 3, 2 Norway, 1900, 3, 1 ...
最好是PHP或JavaScript,但我愿意学习新东西。
答案 0 :(得分:0)
使用
$lines = explode(PHP_EOL, $csv);
$data = array();
foreach ($lines as $line)
$data[] = explode("\t", $line);
(如果它的标签分开,就像在你的例子中看起来那样),并用两个循环运行它。
这是经过测试的代码:
$csv1 = <<<TXT
1900 1901 1902
Norway 3 2
Sweden 1 3 3
Denmark 5 2 3
TXT;
$csv2 = <<<TXT
1900 1901 1902
Norway 1 3 4
Sweden 1 2
Iceland 1 6 3
TXT;
$csvs = array(
'IndicatorA_Name' => $csv1,
'IndicatorB_Name' => $csv2);
/* of course, if you're pulling this from csv files,
you need to modify it accordingly, e.g.
$files = array('IndicatorA_Name', 'IndicatorB_Name', ...);
$csvs = array();
foreach ($files as $f)
$csvs[] = file_get_contents($path . '/' . $f . '.csv');
or use file(), then you don't need the first `explode` line later */
$data = array();
foreach ($csvs as $indicator => $csv) {
$lines = explode(PHP_EOL, $csv);
$header = explode("\t", array_shift($lines));
foreach ($lines as $line) {
$fields = explode("\t", $line);
for ($i = 1; $i < count($fields); $i++) {
$data[$fields[0]][$header[$i]][$indicator] = $fields[$i];
}
}
}
$rows = array();
foreach ($data as $country => $years) {
foreach ($years as $year => $values) {
$str = sprintf(PHP_EOL."('%s', '%d'", mysql_real_escape_string($country), intval($year));
foreach (array_keys($csvs) as $indicator) {
if (isset($values[$indicator]))
$str .= sprintf(", '%s'", mysql_real_escape_string(trim($values[$indicator])));
else
$str .= ", ''";
}
$rows[] = $str . ")";
}
}
$sql = "INSERT INTO table_name (".implode(',', array_keys($csvs)).") VALUES ".
implode(',', $rows);
$ sql现在是
INSERT INTO table_name (IndicatorA_Name,IndicatorB_Name) VALUES
('Norway', '1900', '3', '1'),
('Norway', '1901', '2', '3'),
('Norway', '1902', '', '4'),
('Sweden', '1900', '1', '1'),
('Sweden', '1901', '3', '2'),
('Sweden', '1902', '3', ''),
('Denmark', '1900', '5', ''),
('Denmark', '1901', '2', ''),
('Denmark', '1902', '3', ''),
('Iceland', '1900', '', '1'),
('Iceland', '1901', '', '6'),
('Iceland', '1902', '', '3')
答案 1 :(得分:0)
您应该编写类似以下代码的代码:
$file = file_get_contents('file.csv');
$lines = explode("\n", $file); //lines
$years = explode(";", $lines[0]); //first line is years, so it gives us array of years
for($i = 1, $c = count($lines)-1; $i < $c; ++$i){ //iterate over lines (excluding years)
$lineData = explode(';', $lines[$i]); //array from line
$country = $lineData[0]; //first line entry is country
unset($lineData[0]);
$indicators = $lineData; //and the rest are indicators
query('INSERT INTO data(country, year, IndicatorA_Name) VALUES(?,?,?)', $country, $year, $indicators[0]);
}
答案 2 :(得分:0)
我建议使用fgetcsv
(参见用法示例的链接)或str_getcsv
("\t"
作为Czechnology建议的分隔符)。
这样您就可以自动支持嵌入式分隔符等边缘情况(例如逗号分隔文件中字段中的逗号)。通常最好不要重新发明轮子。