假设我有以下字符串:
I have | been very busy lately and need to go | to bed early
通过拆分" |",你得到:
$arr = array(
[0] => I have
[1] => been very busy lately and need to go
[2] => to bed early
)
第一次拆分是在2个单词之后,第二次拆分后是8个单词。将存储多少个要分割的单词后的位置:array(2,8,3)。然后,该字符串被内爆以传递给自定义字符串标记器:
tag_string('I have been very busy lately and need to go to bed early');
我不知道tag_string的输出是什么,除了总单词将保持不变。输出的例子是:
I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy
这将使字符串延长未知数量的字符。我无法控制tag_string。我所知道的是(1)单词的数量与之前相同,(2)数组在2之后分割,之后分别在8个单词之后分割。我现在需要一个解决方案将标记的字符串分解为与以前相同的数组:
$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
// split after 2nd, and thereafter after 8th word
}
输出:
$arr = array(
[0] => I have-nn
[1] => been-vb very-vb busy lately and-rr need to-r go
[2] => to bed early-p
)
所以要清楚(我之前没有):我不能通过记住strpos来分裂,因为在字符串之前和之后的strpos经过了标记器,并不是一样的。我需要计算单词的数量。我希望我能让自己更清楚:)
答案 0 :(得分:3)
您不想计算单词数,您需要计算字符串长度(strlen
)。如果它是没有管道的相同字符串,那么您希望在一定数量后将其与substr
拆分。
$strCounts = array();
foreach ($arr as $item) {
$strCounts[] = strlen($item);
}
// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
$arr[] = substr($string, $i, $count);
$i += $count; // increment the start position by the length
}
我没有测试过这个,只是一个“理论”,可能有一些问题需要解决。可能有更好的方法去做,我只是不知道。
答案 1 :(得分:1)
我不太确定我理解你真正希望实现的目标。但是这里有一些可能对你有帮助的事情:
str_word_count()计算字符串中的单词数。 preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo);
几乎完全相同,但是使用UTF-8字符串。
strpos()在另一个字符串中找到第一个字符串。你可以很容易地找到所有的位置有了这个:
$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
$positions[] = $pos;
}
我仍然不确定我明白为什么你不能只使用explode()来做这件事。
<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
$words[] = str_word_count($s);
}
答案 2 :(得分:1)
有趣的问题,虽然我认为rope data structure仍然适用,但可能有点矫枉过正,因为单词放置不会改变。这是我的解决方案:
$str = "I have | been very busy lately and need to go | to bed early";
function get_breaks($str)
{
$breaks = array();
$arr = explode("|", $str);
foreach($arr as $val)
{
$breaks[] = str_word_count($val);
}
return $breaks;
}
$breaks = get_breaks($str);
echo "<pre>" . print_r($breaks, 1) . "</pre>";
$str = str_replace("|", "", $str);
function rebreak($str, $breaks)
{
$return = array();
$old_break = 0;
$arr = str_word_count($str, 1);
foreach($breaks as $break)
{
$return[] = implode(" ", array_slice($arr, $old_break, $break));
$old_break += $break;
}
return $return;
}
echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
如果您有任何疑问,请告诉我,但这是非常自我解释的。肯定有办法改善这一点。