从CSV文件构建平面表

时间:2011-02-26 17:51:48

标签: javascript php csv etl

我有500种这种格式的CSV文件:

IndicatorA_Name.csv

        1900    1901    1902 ...
Norway  3      2       
Sweden  1      3       3
Denmark 5      2       3    
... 

IndicatorB_Name.csv

        1900    1901    1902 ...
Norway  1      3       4
Sweden  1      2       
Iceland 1      6       3    
... 
  • 列数年,行数国家。
  • 请注意,文件之间的国家/地区,年份和值可能不同。

我想浏览所有这些文件并制作一个具有以下结构的平面表(CSV文件):

country, year, IndicatorA_Name, IndicatorB_Name, ...
Sweden, 1900, 1, 1
Sweden, 1901, 3, 2
Norway, 1900, 3, 1
...

最好是PHP或JavaScript,但我愿意学习新东西。

3 个答案:

答案 0 :(得分:0)

使用

$lines = explode(PHP_EOL, $csv);
$data = array();
foreach ($lines as $line)
  $data[] = explode("\t", $line);

(如果它的标签分开,就像在你的例子中看起来那样),并用两个循环运行它。

修改

这是经过测试的代码:

$csv1 = <<<TXT
        1900    1901    1902
Norway  3   2   
Sweden  1   3   3
Denmark 5   2   3
TXT;
$csv2 = <<<TXT
        1900    1901    1902
Norway  1   3   4
Sweden  1   2   
Iceland 1   6   3    
TXT;

$csvs = array(
  'IndicatorA_Name' => $csv1,
  'IndicatorB_Name' => $csv2);
/* of course, if you're pulling this from csv files, 
   you need to modify it accordingly, e.g.

$files = array('IndicatorA_Name', 'IndicatorB_Name', ...);
$csvs = array();
foreach ($files as $f)
  $csvs[] = file_get_contents($path . '/' . $f . '.csv');

   or use file(), then you don't need the first `explode` line later */


$data = array();
foreach ($csvs as $indicator => $csv) {
  $lines = explode(PHP_EOL, $csv);

  $header = explode("\t", array_shift($lines));
  foreach ($lines as $line) {
    $fields = explode("\t", $line);

    for ($i = 1; $i < count($fields); $i++) {
      $data[$fields[0]][$header[$i]][$indicator] = $fields[$i];
    }
  }
}

$rows = array();
foreach ($data as $country => $years) {
  foreach ($years as $year => $values) {
    $str = sprintf(PHP_EOL."('%s', '%d'", mysql_real_escape_string($country), intval($year));

    foreach (array_keys($csvs) as $indicator) {
      if (isset($values[$indicator]))
        $str .= sprintf(", '%s'", mysql_real_escape_string(trim($values[$indicator])));
      else
        $str .= ", ''";
    }
    $rows[] = $str . ")";
  }
}

$sql = "INSERT INTO table_name (".implode(',', array_keys($csvs)).") VALUES ".
       implode(',', $rows);

$ sql现在是

INSERT INTO table_name (IndicatorA_Name,IndicatorB_Name) VALUES 
('Norway', '1900', '3', '1'),
('Norway', '1901', '2', '3'),
('Norway', '1902', '', '4'),
('Sweden', '1900', '1', '1'),
('Sweden', '1901', '3', '2'),
('Sweden', '1902', '3', ''),
('Denmark', '1900', '5', ''),
('Denmark', '1901', '2', ''),
('Denmark', '1902', '3', ''),
('Iceland', '1900', '', '1'),
('Iceland', '1901', '', '6'),
('Iceland', '1902', '', '3')

答案 1 :(得分:0)

您应该编写类似以下代码的代码:

    $file = file_get_contents('file.csv');
    $lines = explode("\n", $file); //lines
    $years = explode(";", $lines[0]); //first line is years, so it gives us array of years
    for($i = 1, $c = count($lines)-1; $i < $c; ++$i){ //iterate over lines (excluding years)
        $lineData = explode(';', $lines[$i]); //array from line
        $country = $lineData[0]; //first line entry is country
        unset($lineData[0]); 
        $indicators = $lineData; //and the rest are indicators
        query('INSERT INTO data(country, year, IndicatorA_Name) VALUES(?,?,?)', $country, $year, $indicators[0]);
    }

答案 2 :(得分:0)

我建议使用fgetcsv(参见用法示例的链接)或str_getcsv"\t"作为Czechnology建议的分隔符)。

这样您就可以自动支持嵌入式分隔符等边缘情况(例如逗号分隔文件中字段中的逗号)。通常最好不要重新发明轮子。