使用正则表达式从文本中提取信息

时间:2013-08-06 18:27:52

标签: php regex preg-match

我正在尝试从长字符串文本中提取特定信息。文字是:

  

评级:明确分数:17标签:apron blonde_hair brown_eyes itaru_chokusha kirigaya_kazuto long_hair nipples no_bra nopan nude sword_art_online yuuki_asuna用户:openui

我想将它们提取为

  1. 评分:明确
  2. 得分:17
  3. 标签:apron blonde_hair brown_eyes itaru_chokusha kirigaya_kazuto long_hair sword_art_online yuuki_asuna
  4. 用户:openui
  5. 我尝试的代码只能取出标题

    $imageTitle = "Rating: Explicit Score: 17 Tags: apron blonde_hair brown_eyes itaru_chokusha kirigaya_kazuto long_hair nipples no_bra nopan nude sword_art_online yuuki_asuna User: openui";
    preg_match_all("/[a-z]{1,}\:\s/i", $imageTitle, $matches);
    var_dump($matches);
    

    我最后尝试使用(.*),但它提供了全文。这个只提取一个单词

    preg_match_all("/[a-z]{1,}\:\s[a-z0-9]{1,}/i", $imageTitle, $matches);
    //Output
    array (size=1)
      0 => 
        array (size=4)
          0 => string 'Rating: Explicit' (length=16)
          1 => string 'Score: 17' (length=9)
          2 => string 'Tags: apron' (length=11)
          3 => string 'User: openui' (length=12)
    

    如何提取剩余信息?如果可能的话,也可以作为数组索引和值。

1 个答案:

答案 0 :(得分:0)

preg_match_all应该有效:

$s = 'Rating: Explicit Score: 17 Tags: apron blonde_hair brown_eyes itaru_chokusha
      kirigaya_kazuto long_hair sword_art_online yuuki_asuna User: openui';

if (preg_match_all('#\s*(.+?(?=((^|\s)[A-Z][a-z]*:\s*|$)))#i', $s, $arr))    
   print_r($arr[1]);

<强>输出:

Array
(
    [0] => Rating: Explicit
    [1] => Score: 17
    [2] => Tags: apron blonde_hair brown_eyes itaru_chokusha kirigaya_kazuto long_hair sword_art_online yuuki_asuna
    [3] => User: openui
)