使用未转义的机箱读取CSV文件

时间:2012-03-16 11:34:31

标签: php csv fgetcsv

我正在阅读一个CSV文件但是有些值没有被转义,所以PHP读错了。这是一个不好的行的例子:

  

“635”,“”,“AUBREY R. PHILLIPS(1920-) - 粉彩描绘小屋   一个陡峭的河谷,可能是北威尔士,签署并注明日期   2000年,框架,66厘米×48厘米。另一个乡村景观,名为verso   “收获时间,萨默塞特”签名并注明日期'87,框架,69厘米,49厘米。   (2)NB - Aubrey Phillips是伍斯特郡的一位艺术家   Stourbridge艺术学院。“,”40“,”60“,”WAT“,”绘画,版画和版画   水彩”,

你可以看到收获时间,萨默塞特有引号,导致PHP认为它是一个新值。

当我在每一行上执行print_r()时,虚线最终看起来像这样:

Array
(
    [0] =>  635
    [1] =>  
    [2] => AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time
    [3] => Somerset" signed and dated '87
    [4] => framed
    [5] => 69cm by 49cm. (2)  NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art."
    [6] => 40
    [7] => 60
    [8] => WAT
    [9] => Paintings, prints and watercolours
    [10] => 
)

这显然是错误的,因为它现在包含比其他正确行更多的数组元素。

这是我正在使用的PHP:

$i = 1;
if (($file = fopen($this->request->data['file']['tmp_name'], "r")) !== FALSE) {
    while (($row = fgetcsv($file, 0, ',', '"')) !== FALSE) {
        if ($i == 1){
            $header = $row;
        }else{
            if (count($header) == count($row)){
                $lots[] = array_combine($header, $row);
            }else{
                $error_rows[] = $row;
            }

        }
        $i++;
    }
    fclose($file);
}

将错误数量的行放入$error_rows,其余行放入大$lots数组。

我该怎么做才能解决这个问题?感谢。

5 个答案:

答案 0 :(得分:1)

如果您知道您将始终获得条目0和1,并且数组中的最后5个条目始终是正确的,那么它只是由于未转义的机箱字符而被“损坏”的描述性条目,那么您可以提取前2和后5使用array_slice()implode()将余数重新放回单个字符串(恢复丢失的引号),并正确重建数组。

$testData = '" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated \'87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",';

$result = str_getcsv($testData, ',', '"');

$hdr = array_slice($result,0,2);
$bdy = array_slice($result,2,-5);
$bdy = trim(implode('"',$bdy),'"');
$ftr = array_slice($result,-5);

$fixedResult = array_merge($hdr,array($bdy),$ftr);
var_dump($fixedResult);

结果是:

array
  0 => string ' 635' (length=4)
  1 => string ' ' (length=1)
  2 => string 'AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time" Somerset" signed and dated '87" framed" 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.' (length=362)
  3 => string '40' (length=2)
  4 => string '60' (length=2)
  5 => string 'WAT' (length=3)
  6 => string 'Paintings, prints and watercolours' (length=34)
  7 => string '' (length=0)

不完美,但可能还不够

另一种方法是让生成csv的人正确地逃离他们的机箱

答案 1 :(得分:1)

如果你能逃脱“像这样的文字:\”

并且在fgetcsv中使用指定转义字符

fgetcsv($file, 0, ',', '"','\');

答案 2 :(得分:0)

这是一个很长的镜头,所以不要认真对待我。

我在文中看到一个模式,你要忽略的所有','后面都有一个空格。 用'FUU'或其他独特的东西搜索和替换','。

现在解析csv文件。它可能会得到正确的格式。您只需将'FUU'替换回','

:)

答案 3 :(得分:0)

您可能正在以行数组的形式读取CSV文件的内容,然后在逗号上分割每一行。由于某些字段还包含逗号,因此失败。可以帮助你的一个技巧是寻找",",这将指示一个字段分隔符,不太可能(但不是不可能)在字段内发生。

<?php
  $csv = file_get_contents("yourfile.csv");
  $lines = split("\r\n", $csv);
  echo "<pre>";
  foreach($lines as $line)
  {
    $line = str_replace("\",\"", "\"@@@\"", $line);
    $fields = split("@@@", $line);
    print_r($fields);
  }
  echo "</pre>";
?>

答案 4 :(得分:0)

$csv = explode(' ', $csv);
foreach ($csv as $k => $v) if($v[0] == '"' && substr($v, -1) == '"') {
    $csv[$k] = mb_convert_encoding('&ldquo;' . substr($v, 1, -1) . '&rdquo;', 'UTF-8', 'HTML-ENTITIES');
}
$csv = implode(' ', $csv);
$csv = str_getcsv($csv);