Question

好吧，你好社区。我在工作＆＃39;在PHP的CSV解码器上（是的，我知道那里已经是一个，但对我来说是一个挑战，因为我在空闲时间学习它）。现在问题是：嗯，行被PHP_EOL拆分。

在这一行：

foreach(explode($sep, $str) as $line) {

其中 sep 是分割行的变量，而 str 是我要解码的字符串。

但是，如果我想用分号拆分列，可能会出现分号是一列的内容。当我研究这个问题时，通过这样的引号来围绕整个列来解决这个问题：

输入：

"0;0";1;2;3;4

预期输出：

0; 0 | 1 | 2 | 3 | 4

我已经想到了前瞻/后视。但是因为我过去没有使用它，也许这可能是一个很好的做法，我不知道如何将它包含在正则表达式中。我的解码函数返回一个二维数组（就像一个表...），我想把这样的行添加到数组中（是的，正则表达式是...）：

$res[] = preg_split("/(?<!\")". preg_quote($delim). "(?!\")/", $line);

最后我的完整代码：

function csv_decode($str, $delim = ";", $sep = PHP_EOL) {
    if($delim == "\"") $delim = ";";
    $res = [];

    foreach(explode($sep, $str) as $line) {
        $res[] = preg_split("/(?<!\")". preg_quote($delim). "(?!\")/", $line);
    }

    return $res;
}

提前致谢！

Answer 1

您可以在此使用此功能str_getcsv，您也可以指定自定义分隔符（;）。

Try this code snippet

<?php

$string='"0;0";1;2;3;4';
print_r(str_getcsv($string,";"));

输出：

Array
(
    [0] => 0;0
    [1] => 1
    [2] => 2
    [3] => 3
    [4] => 4
)

Answer 2

这有点违反直觉，但通过正则表达式分割字符串的最简单方法通常是使用preg_match_all代替preg_split：

preg_match_all('~("[^"]*"|[^;"]*)(?:;|$)~A', $line, $m);
$res[] = $m[1];

A修饰符确保从字符串开头连续匹配的连续性。

如果您不希望结果中包含引号，则可以使用branch reset feature (?|..(..)..|..(..)..)：

preg_match_all('~(?|"([^"]*)"|([^;"]*))(?:;|$)~A', $line, $m);

其他解决方法，但这次是针对preg_split：在分隔符之前包含您要避免的部分，并使用\K功能将其从整个匹配项中删除：

$res[] = preg_split('~(?:"[^"]*")?\K;~', $line);

Answer 3

对于csv类型的行，拆分不是一个好选择您可以使用带有find全局类型func的旧的\G锚点。

Practical

正则表达式：'~\G(?:(?:^|;)\s*)(?|"([^"]*)"|([^;]*?))(?:\s*(?:(?=;)|$))~'

信息：

 \G                            # G anchor, start where last match left off
 (?:                           # leading BOL or ;
      (?: ^ | ; )
      \s*                           # optional whitespaces
 )
 (?|                           # branch reset
      " 
      ( [^"]* )                     # (1), double quoted string data
      "
   |                              # or
      ( [^;]*? )                    # (1), non-quoted field
 )
 (?:                           # trailing optional whitespaces
      \s* 
      (?:
           (?= ; )                       # lookahead for ;
        |  $                             # or EOL
      )
 )

由分号分隔，不包括引号

3 个答案: