拆分一个字符串,记住分裂的位置

时间:2012-02-13 16:58:20

标签: php

假设我有以下字符串:

I have | been very busy lately and need to go | to bed early

通过拆分" |",你得到:

$arr = array(
  [0] => I have
  [1] => been very busy lately and need to go
  [2] => to bed early
)

第一次拆分是在2个单词之后,第二次拆分后是8个单词。将存储多少个要分割的单词后的位置:array(2,8,3)。然后,该字符串被内爆以传递给自定义字符串标记器:

tag_string('I have been very busy lately and need to go to bed early');

我不知道tag_string的输出是什么,除了总单词将保持不变。输出的例子是:

I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy

这将使字符串延长未知数量的字符。我无法控制tag_string。我所知道的是(1)单词的数量与之前相同,(2)数组在2之后分割,之后分别在8个单词之后分割。我现在需要一个解决方案将标记的字符串分解为与以前相同的数组:

$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
  // split after 2nd, and thereafter after 8th word
}

输出:

$arr = array(
  [0] => I have-nn
  [1] => been-vb very-vb busy lately and-rr need to-r go
  [2] => to bed early-p
)

所以要清楚(我之前没有):我不能通过记住strpos来分裂,因为在字符串之前和之后的strpos经过了标记器,并不是一样的。我需要计算单词的数量。我希望我能让自己更清楚:)

3 个答案:

答案 0 :(得分:3)

您不想计算单词数,您需要计算字符串长度(strlen)。如果它是没有管道的相同字符串,那么您希望在一定数量后将其与substr拆分。

$strCounts = array();

foreach ($arr as $item) {
    $strCounts[] = strlen($item);
}

// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
     $arr[] = substr($string, $i, $count);
     $i += $count; // increment the start position by the length
}

我没有测试过这个,只是一个“理论”,可能有一些问题需要解决。可能有更好的方法去做,我只是不知道。

答案 1 :(得分:1)

我不太确定我理解你真正希望实现的目标。但是这里有一些可能对你有帮助的事情:

str_word_count()计算字符串中的单词数。 preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo);几乎完全相同,但是使用UTF-8字符串。

strpos()在另一个字符串中找到第一个字符串。你可以很容易地找到所有的位置有了这个:

$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
  $positions[] = $pos;
}

我仍然不确定我明白为什么你不能只使用explode()来做这件事。

<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
  $words[] = str_word_count($s);
}

答案 2 :(得分:1)

有趣的问题,虽然我认为rope data structure仍然适用,但可能有点矫枉过正,因为单词放置不会改变。这是我的解决方案:

$str = "I have | been very busy lately and need to go | to bed early";

function get_breaks($str)
{
    $breaks = array();
    $arr = explode("|", $str);

    foreach($arr as $val)
    {
        $breaks[] = str_word_count($val);
    }

    return $breaks;
}

$breaks = get_breaks($str);

echo "<pre>" . print_r($breaks, 1) . "</pre>";

$str = str_replace("|", "", $str);

function rebreak($str, $breaks)
{
    $return = array();
    $old_break = 0;

    $arr = str_word_count($str, 1);

    foreach($breaks as $break)
    {
        $return[] = implode(" ", array_slice($arr, $old_break, $break));

        $old_break += $break;
    }

    return $return;
}

echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";

echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";

如果您有任何疑问,请告诉我,但这是非常自我解释的。肯定有办法改善这一点。