Question

假设我有以下字符串：

I have | been very busy lately and need to go | to bed early

通过拆分＆＃34; |＆＃34;，你得到：

$arr = array(
  [0] => I have
  [1] => been very busy lately and need to go
  [2] => to bed early
)

第一次拆分是在2个单词之后，第二次拆分后是8个单词。将存储多少个要分割的单词后的位置：array（2,8,3）。然后，该字符串被内爆以传递给自定义字符串标记器：

tag_string('I have been very busy lately and need to go to bed early');

我不知道tag_string的输出是什么，除了总单词将保持不变。输出的例子是：

I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy

这将使字符串延长未知数量的字符。我无法控制tag_string。我所知道的是（1）单词的数量与之前相同，（2）数组在2之后分割，之后分别在8个单词之后分割。我现在需要一个解决方案将标记的字符串分解为与以前相同的数组：

$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
  // split after 2nd, and thereafter after 8th word
}

输出：

$arr = array(
  [0] => I have-nn
  [1] => been-vb very-vb busy lately and-rr need to-r go
  [2] => to bed early-p
)

所以要清楚（我之前没有）：我不能通过记住strpos来分裂，因为在字符串之前和之后的strpos经过了标记器，并不是一样的。我需要计算单词的数量。我希望我能让自己更清楚：）

Answer 1

您不想计算单词数，您需要计算字符串长度（strlen）。如果它是没有管道的相同字符串，那么您希望在一定数量后将其与substr拆分。

$strCounts = array();

foreach ($arr as $item) {
    $strCounts[] = strlen($item);
}

// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
     $arr[] = substr($string, $i, $count);
     $i += $count; // increment the start position by the length
}

我没有测试过这个，只是一个“理论”，可能有一些问题需要解决。可能有更好的方法去做，我只是不知道。

Answer 2

我不太确定我理解你真正希望实现的目标。但是这里有一些可能对你有帮助的事情：

str_word_count()计算字符串中的单词数。 preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo);几乎完全相同，但是使用UTF-8字符串。

strpos()在另一个字符串中找到第一个字符串。你可以很容易地找到所有的位置有了这个：

$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
  $positions[] = $pos;
}

我仍然不确定我明白为什么你不能只使用explode()来做这件事。

<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
  $words[] = str_word_count($s);
}

Answer 3

有趣的问题，虽然我认为rope data structure仍然适用，但可能有点矫枉过正，因为单词放置不会改变。这是我的解决方案：

$str = "I have | been very busy lately and need to go | to bed early";

function get_breaks($str)
{
    $breaks = array();
    $arr = explode("|", $str);

    foreach($arr as $val)
    {
        $breaks[] = str_word_count($val);
    }

    return $breaks;
}

$breaks = get_breaks($str);

echo "<pre>" . print_r($breaks, 1) . "</pre>";

$str = str_replace("|", "", $str);

function rebreak($str, $breaks)
{
    $return = array();
    $old_break = 0;

    $arr = str_word_count($str, 1);

    foreach($breaks as $break)
    {
        $return[] = implode(" ", array_slice($arr, $old_break, $break));

        $old_break += $break;
    }

    return $return;
}

echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";

echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";

如果您有任何疑问，请告诉我，但这是非常自我解释的。肯定有办法改善这一点。

拆分一个字符串，记住分裂的位置

3 个答案: