我如何检查csv文件中行中的特定值,然后将其用作标题行,如果有重复项将其删除?

时间:2012-01-23 09:22:15

标签: php mysql csv

我使用以下函数将csv文件导入mysql:

function csv_2_mysql($source_file, $target_table, $max_line_length=10000) {
    if (($handle = fopen("$source_file", "r")) !== FALSE) {
        $columns = fgetcsv($handle, $max_line_length, ",");
        foreach ($columns as &$column) {
            $column = preg_replace('/[^a-z0-9]/i', '', $column);
        }
        $insert_query_prefix = "INSERT INTO $target_table (".join(",",$columns).")\nVALUES";
        while (($data = fgetcsv($handle, $max_line_length, ",")) !== FALSE) { 
         while (count($data)<count($columns))
                array_push($data, NULL);
            $query = "$insert_query_prefix (".join(",",quote_all_array($data)).");";
            mysql_query($query);
        }
        fclose($handle);
    }
}

function quote_all_array($values) {
    foreach ($values as $key=>$value)
        if (is_array($value))
            $values[$key] = quote_all_array($value);
        else
            $values[$key] = quote_all($value);
        return $values;
}

function quote_all($value) {
    if (is_null($value))
        return "NULL";
    $value = "'" . mysql_real_escape_string($value) . "'";
    return $value;
}

问题是,由于在源头切割和合并csv文件,有时标题不在第一行,所以例如它可能看起来像这样:

value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4
header1,header2,header3,header4
value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4
header1,header2,header3,header4
value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4

value1是唯一的,所以我知道除了标题之外永远不会有重复的行。如何调整函数,以便删除重复的标题行(如果存在)并确保剩余的标题行用于$ columns?我只是手动设置列值,除了每个csv可能有不同的列数(header1和value1除外,它们总是存在,因为它是唯一的时间戳)。

更新:

好吧,我想出来但是使用fopen和file_get_contents感觉不对。我会在大型csv上遇到问题吗?

function csv_2_mysql($source_file, $target_table, $uid, $nid, $max_line_length=10000) {
    if (($handle = fopen("$source_file", "r")) !== FALSE) {
      $handle2  = file_get_contents($source_file) or exit;
      $handle_row = explode("\n", $handle2);
      foreach ($handle_row as $key => $val) {
          $row_array = explode(',', $val);
          foreach ($row_array as $key => $val) {
              $row_array[$key] = trim(str_replace('"', '', $val));
              }
              if(!in_array('header1', $row_array)) {
                unset ($row_array);
              }
              else {
                $columns = $row_array;
              }
          }
        foreach ($columns as &$column) {
            $column = preg_replace('/[^a-z0-9]/i', '', $column);
        }

        $insert_query_prefix = "INSERT INTO $target_table (".join(",",$columns).")\nVALUES";
        while (($data = fgetcsv($handle, $max_line_length, ",")) !== FALSE) {
         while (count($data)<count($columns))
                array_push($data, NULL);
            $query = "$insert_query_prefix (".join(",",quote_all_array($data)).");";
            mysql_query($query);        
        }
        fclose($handle);
    }
}

function quote_all_array($values) {
    foreach ($values as $key=>$value)
        if (is_array($value))
            $values[$key] = quote_all_array($value);
        else
            $values[$key] = quote_all($value);
        return $values;
}

function quote_all($value) {
    if (is_null($value))
        return "NULL";
    $value = "'" . mysql_real_escape_string($value) . "'";
    return $value;
}

1 个答案:

答案 0 :(得分:-1)

我猜你可以有一个数组来推送第一列的值(因为你说它是唯一的)并检查它是否重复。如果是重复,则忽略该行并继续。