使用PHP从文本中获取索引

时间:2013-05-10 01:28:21

标签: php text tags preg-match

这个问题是我上一个问题的延续:

  

Check tag & get the value inside tag using PHP

我有这样的文字:

<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.

使用之前我的问题的答案代码和PREG_OFFSET_CAPTURE添加如下:

function get_text_between_tags($string, $tagname) {
    $pattern = "/<$tagname\b[^>]*>(.*?)<\/$tagname>/is";
    preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
    if(!empty($matches[1]))
        return $matches[1];
    return array();
}

我得到一个输出:

  

阵列(
  [0] =&gt;数组([0] =&gt; Pekalongan Regency头[1] =&gt; 14)
  [1] =&gt;数组([0] =&gt; IPB [1] => 131的校长)
  [2] =&gt;数组([0] =&gt;官方IPB [1] =&gt; 222))

14,11,222是匹配模式时的字符索引。我可以获得单词索引吗?我的意思是这样的输出:

  

阵列(
  [0] =&gt;数组([0] =&gt; Pekalongan Regency头[1] =&gt; 0)
  [1] =&gt;数组([0] =&gt; IPB [1]的校长=&gt; 15)
  [2] =&gt;数组([0] =&gt;官方IPB [1] =&gt; 27))

除了PREG_OFFSET_CAPTURE还是需要更多代码之外还有其他方法吗?我不知道。 感谢帮助。 :)

1 个答案:

答案 0 :(得分:1)

这会有效,但需要一点点完成:

<?php

$raw = '<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.';

$result = getExploded($raw,'<ORGANIZATION>','</ORGANIZATION>');

echo '<pre>';
print_r($result);
echo '</pre>';

function getExploded($data, $tagStart, $tagEnd) {
    $tmpData = explode($tagStart,$data);
    $wordCount = 0;
    foreach($tmpData as $k => $v) {
        $tmp = explode($tagEnd,$v);
        $result[$k][0] = $tmp[0];
        $result[$k][1] = $wordCount;
        $wordCount = $wordCount + (count(explode(' ',$v)) - 1);
    }
    return $result;
}

&GT;

结果是:

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 0
        )

    [1] => Array
        (
            [0] => Head of Pekalongan Regency
            [1] => 0
        )

    [2] => Array
        (
            [0] => Rector of IPB
            [1] => 16
        )

    [3] => Array
        (
            [0] => officials of IPB
            [1] => 28
        )

    )