str_getcsv不会将多行CSV中的第一列括在双引号中

时间:2016-04-14 01:00:20

标签: php csv

我注意到str_getcsv似乎没有包含它在双引号中接收的第一个值,即使以这种方式传递字符串数据也是如此。

在下面的示例中,第3行中的第一个值为"Small Box, But Smaller",但在通过str_getcsv运行后,它变为Small Box, But Smaller(不带双引号)。像这样:

// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","Not sure why we need this.",0
CSV;

// split string into rows (don't use explode in case multi-line values exist)
$csvRows = str_getcsv($csvString, "\n"); // parse rows
echo '<pre>';
print_r($csvRows);
echo '</pre>';

输出:

Array
(
    [0] => Title,"Description",Quantity
    [1] => Small Box,"For storing magic beans.",2
    [2] => Small Box, But Smaller,"Not sure why we need this.",0
)

这导致的问题是,现在如果使用str_getcsv解析每一行,则第一个值中的逗号会将其拆分为两行。如果它继续运行:

foreach($csvRows as &$csvRow) {
    $csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up

// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';

输出:

Array
(
    [0] => Array
        (
            [0] => Title
            [1] => Description
            [2] => Quantity
        )

    [1] => Array
        (
            [0] => Small Box
            [1] => For storing magic beans.
            [2] => 2
        )

    [2] => Array
        (
            [0] => Small Box
            [1] =>  But Smaller
            [2] => Not sure why we need this.
            [3] => 0
        )

)

问题在于最后一个数组值,它是一个包含4个键而不是3个数组的数组。它在值"Small Box, But Smaller"的逗号上分开。

另一方面,只解析一个行字符串:

$csvRowData = '"Small Box, But Smaller","Not sure why we need this.",0';
$csvValues = str_getcsv($csvRowData);

echo '<pre>';
print_r($csvValues);
echo '</pre>';

输出:

Array
(
    [0] => Small Box, But Smaller
    [1] => Not sure why we need this.
    [2] => 0
)

为什么会发生这种情况?如何解决多行CSV数据的问题?当多行CSV数据是字符串并且不是直接从文件中读取时,是否有最佳实践?此外,我需要处理多行值,例如"foo \n bar",因此我不能只使用explode()而不是第一个str_getcsv()

2 个答案:

答案 0 :(得分:3)

经过多次头痛我觉得我现在明白了这个问题。根据PHP人员的说法,“str_getcsv()旨在将单个CSV记录解析为字段”(参见https://bugs.php.net/bug.php?id=55763)。我发现对多行使用str_getcsv()会导致这些记录不明的问题:

  • 不保留双引号(正如我在上面所做的那样)。
  • 值中的换行符会导致它认为新行已开始。这可能会产生许多意想不到的后果。

我通过创建临时文件并将CSV内容写入其中来解决了这个问题。然后我使用fgetcsv()读取文件,这不会导致上面描述的2个问题。示例代码:

// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","This value
contains
multiple
lines.",0
CSV;
// ^ notice the multiple lines in the last row's value

// create a temporary file
$tempFile = tmpfile();
// write the CSV to the file
fwrite($tempFile, $csvString);
// go to first character
fseek($tempFile, 0);

// track CSV rows
$csvRows = array();
// read the CSV temp file line by line
while (($csvColumns = fgetcsv($tempFile)) !== false) {
    $csvRows[] = $csvColumns; // push columns to array (really it would be more memory-efficient to process the data here and not append to an array)
}

// Close and delete the temp file
fclose($tempFile);

// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';

结果:

Array
(
    [0] => Array
        (
            [0] => Title
            [1] => Description
            [2] => Quantity
        )

    [1] => Array
        (
            [0] => Small Box
            [1] => For storing magic beans.
            [2] => 2
        )

    [2] => Array
        (
            [0] => Small Box, But Smaller
            [1] => This value
contains
multiple
lines.
            [2] => 0
        )

)

我还要补充一点,我在GitHub上找到了一些选项,还有两个用于PHP 5.4+和PHP 5.5+的主要项目。但是,我仍在使用PHP 5.3,只看到活动有限的选项。此外,其中一些通过写入文件并将其读出来处理CSV字符串。

我还应该注意,PHP的文档有一些关于str_getcsv()不符合RFC的评论:http://php.net/manual/en/function.str-getcsv.phpfgetcsv()似乎也是如此,但后者确实符合我的需要,至少在这种情况下是这样。

答案 1 :(得分:0)

我不知道为什么你的PHP_EOL在我的服务器上运行不正常,但我之前遇到过这个问题。

我采取的方法如下:

首先,我想确保所有字段都被双引号括起来,无论字段中的值如何,所以要使用示例文本(稍作修改):

// multi-line csv string
$csvString = <<<CSV
"Title","Description","Quantity"
"Small Box","For storing magic beans.","2"
"Small Box, But Smaller","Not sure why we need this.","0"
"a","\n","b","c"

CSV;

$csvString .= '"a","' . "\n" . '","' . PHP_EOL . '","c"';

其次我定位可能在值中延迟的独奏PHP_EOL,因此我可以用“\ r \ n”替换任何“PHP_EOL”字符串

// Clear any solo end of line characters that are within values
$csvString = str_replace('","' . PHP_EOL . '"', '",""',$csvString);
$csvString = str_replace('"' . PHP_EOL . '","', '"","',$csvString);

$csvString = str_replace('"' . PHP_EOL . '"', '"'. "\r\n" . '"',$csvString);

然后最后这允许我使用php explode函数并显示输出:

$csvArr = explode("\r\n",$csvString);
foreach($csvArr as &$csvRow) {
    $csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up

// output
echo '<pre>';
print_r($csvArr);
echo '</pre>';

哪个输出:

Array
(
    [0] => Array
        (
            [0] => Title
            [1] => Description
            [2] => Quantity
        )

    [1] => Array
        (
            [0] => Small Box
            [1] => For storing magic beans.
            [2] => 2
        )

    [2] => Array
        (
            [0] => Small Box, But Smaller
            [1] => Not sure why we need this.
            [2] => 0
        )

    [3] => Array
        (
            [0] => a
            [1] => 

            [2] => b
            [3] => c
        )

    [4] => Array
        (
            [0] => a
            [1] => 

            [2] => 
            [3] => c
        )

)

从输出中可以看出,新行字符不是目标,只是PHP_EOL。