这个问题是我上一个问题的延续:
我有这样的文字:
<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.
使用之前我的问题的答案代码和PREG_OFFSET_CAPTURE
添加如下:
function get_text_between_tags($string, $tagname) {
$pattern = "/<$tagname\b[^>]*>(.*?)<\/$tagname>/is";
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
if(!empty($matches[1]))
return $matches[1];
return array();
}
我得到一个输出:
阵列(
[0] =&gt;数组([0] =&gt; Pekalongan Regency头[1] =&gt; 14)
[1] =&gt;数组([0] =&gt; IPB [1] => 131的校长)
[2] =&gt;数组([0] =&gt;官方IPB [1] =&gt; 222))
14,11,222是匹配模式时的字符索引。我可以获得单词索引吗?我的意思是这样的输出:
阵列(
[0] =&gt;数组([0] =&gt; Pekalongan Regency头[1] =&gt; 0)
[1] =&gt;数组([0] =&gt; IPB [1]的校长=&gt; 15)
[2] =&gt;数组([0] =&gt;官方IPB [1] =&gt; 27))
除了PREG_OFFSET_CAPTURE
还是需要更多代码之外还有其他方法吗?我不知道。
感谢帮助。 :)
答案 0 :(得分:1)
这会有效,但需要一点点完成:
<?php
$raw = '<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.';
$result = getExploded($raw,'<ORGANIZATION>','</ORGANIZATION>');
echo '<pre>';
print_r($result);
echo '</pre>';
function getExploded($data, $tagStart, $tagEnd) {
$tmpData = explode($tagStart,$data);
$wordCount = 0;
foreach($tmpData as $k => $v) {
$tmp = explode($tagEnd,$v);
$result[$k][0] = $tmp[0];
$result[$k][1] = $wordCount;
$wordCount = $wordCount + (count(explode(' ',$v)) - 1);
}
return $result;
}
&GT;
结果是:
Array
(
[0] => Array
(
[0] =>
[1] => 0
)
[1] => Array
(
[0] => Head of Pekalongan Regency
[1] => 0
)
[2] => Array
(
[0] => Rector of IPB
[1] => 16
)
[3] => Array
(
[0] => officials of IPB
[1] => 28
)
)