将字符串拆分为数组(不均匀的列长度)

时间:2011-02-25 16:41:55

标签: php

我使用file()读取数据并迭代每一行。需要能够将字符串拆分为"列"的数组。问题是列的宽度不均匀(60个字符,40个字符等)。似乎所有要执行此操作的函数都希望列是固定大小。

这将定期在大型数据文件上执行,因此需要最佳性能。

数据示例。

XXXXXXXXXXXXXXXXXXXXXXXXXX                                  XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXX                                                   XXX XXX                 X         XXX
XXXXXXXXXXXXXXX                                             XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX                                  XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXX                                                   XXX XXX                 X         XXX
XXXXXXXXXXXXXXX                                             XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX                                  XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXX                                                   XXX XXX                 X         XXX
XXXXXXXXXXXXXXX                                             XXXXXXXXXXXXX           XX        XXXXXX

4 个答案:

答案 0 :(得分:1)

直接的方法是使用substr来分割列:

foreach (file($fn) as $i=>$line) {
    $rows[$i] = array(substr($line, 0, 60), substr($line, 60, 40), substr($line, 100, 30));
}

但与常识相反,使用PCRE和正则表达式分割字符串会更快:

preg_match_all('/^(.{60})(.{40})(.{30})\K/m', file_get_contents($fn), $rows, PREG_SET_ORDER); 

这里的缺点是每行包含一个空[0](包含原始行),数据列从索引[1]开始。

答案 1 :(得分:0)

唯一可以做到这一点的方法是文件中是否有分隔符。

explode()拆分分隔符上的字符串,因此如果您知道文件列是以制表符分隔的,则可以 explode('\t',$string) 获取列的数组。

除此之外,没有可靠的方法可以让你在不知道尺寸的情况下拉出可变大小的列。

答案 2 :(得分:0)

在您对我之前的回答发表评论之后,似乎只需要substr()

如果您知道每行的每列的宽度,请执行以下操作:

$rows = array();
foreach( $lines as $line )
{
  $columns = array();
  array_push($columns, substr($line, FirstColStart, FirstColEnd));
  array_push($columns, substr($line, SecondColStart, SecondColEnd));
  //more array pushing for each column
  array_push($rows, $columns);
}
//Do something with your 'row' array of columns ($rows)

答案 3 :(得分:-1)

这就是我想出的。我认为没有提前知道列宽。

<?php

$data = 'XXXXXXXXXXXXXXXXXXXXXXXXXX                                  XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXX                                                   XXX XXX                 X         XXX
XXXXXXXXXXXXXXX                                             XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX                                  XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXX                                                   XXX XXX                 X         XXX
XXXXXXXXXXXXXXX                                             XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX                                  XXXXXXXXXXXXX           XX        XXXXXX
XXXXXXXXX                                                   XXX XXX                 X         XXX
XXXXXXXXXXXXXXX                                             XXXXXXXXXXXXX           XX        XXXXXX';

$dataLines = explode("\n", $data);

// detect column breaks
$numDataLines = count($dataLines);
$colBreaks = array();
$c = 0;
while (true) {
    $rowEnds = 0; // count how many rows have terminated in the current col.
    $notSet = 0; // a special case of $rowEnds, when the line no longer has     
                 // chars.
    // run down each column. if there are no X's, then it is a col break.
    for ($r = 0; $r < $numDataLines; ++$r) {
        if (!isset($dataLines[$r][$c])) {
            ++$notSet;
            ++$rowEnds;
        } elseif ($dataLines[$r][$c] != 'X') {
            ++$rowEnds;
        }
    }
    // if no lines have chars left, end the while loop. this counts as a col 
    // break.
    if ($notSet == $numDataLines) {
        $colBreaks[] = $c;
        break;
    }
    // if no X's were in the line, this is a col break.
    if ($rowEnds == $numDataLines) {
        $colBreaks[] = $c;
    }
    ++$c; // move on to the next col
}

// now that we have all the col breaks, we simply iterate over them and slice
// out the needed sections from each line to create the columns.
$dataCols = array();
$left = 0;
foreach ($colBreaks as $cb) {
    // skip empty cols
    if ($left == $cb) {
        $left = $cb + 1;
        continue;
    }
    $colLen = $cb - $left;
    $dataCol = array();
    echo "left: $left, len: $colLen, cb: $cb\n";
    foreach ($dataLines as $dl) {
        $dataCol[] = substr($dl, $left, $colLen);
    }
    $dataCols[] = implode("\n", $dataCol);
    $left += $colLen + 1;
}

// tada!
print_r($dataCols);