具有特定大小和边界检测的句子

时间:2012-07-09 04:22:03

标签: php regex size boundary sentence

这是我的问题:我有一个大字符串(接近8000个字符),我想要两件事:

  1. 检测句子边界,例如'。' AND
  2. 句子不超过600个字符
  3. 我知道在某些情况下,两者都不可能。在这种情况下,找到一个空格并拆分句子。

    条件号为1的 ridgerunner 的解决方案就像魅力一样,请看原始链接(http://goo.gl/PqI6d),但它经常输出大于600个字符。任何光?提前谢谢!

2 个答案:

答案 0 :(得分:0)

你可能会更好地匹配字符串。您对该匹配的正则表达式如下所示:

(.{0,600}?\.)|(.{0,600}(?=\ ))

简而言之,您首先要尽可能在一段时间之前寻找尽可能小的字符串。如果没有,你会寻找尽可能长的字符串,然后是空格。然后下一场比赛将从你离开的地方开始。

请注意,这是通用的正则表达式。你的php实现可能会有所不同。

答案 1 :(得分:0)

Tks nhahtdh。请看看我是否遗漏了什么。下面是我的字符串摘录和使用你的建议的输出。

<?php 
    $ptn = "/(?:[^.]{1,600}(?: |\.)|\w{600,}(?: |\.)?)/";
    $str = "Amblyopia occurs when the nerve pathway from one eye to the brain does not develop during childhood. This occurs because the abnormal eye sends a blurred image or the wrong image to the brain. This confuses the brain, and the brain may learn to ignore the image from the weaker eye. Strabismus is the most common cause of amblyopia. There is often a family history of this condition. The term "lazy eye" refers to amblyopia, which often occurs along with strabismus. However, amblyopia can occur without strabismus and people can have strabismus without amblyopia.First, any eye condition that is causing poor vision in the amblyopic eye (such as cataracts) needs to be corrected. Children with a refractive error (nearsightedness, farsightedness, or astigmatism) will need glasses. Next, a patch is placed on the normal eye. This forces the brain to recognize the image from the eye with amblyopia. Sometimes, drops are used to blur the vision of the normal eye instead of putting a patch on it. Children whose vision will not fully recover, and those with only good eye due to any disorder should wear glasses with protective polycarbonate lenses. Polycarbonate glasses are shatter- and scratch-resistant. Children who get treated before age 5 will usually recover almost completely normal vision, although they may continue to have problems with depth perception. Delaying treatment can result in permanent vision problems. After age 10, only a partial recovery of vision can be expected. Early recognition and treatment of the problem in children can help to prevent permanent visual loss. All children should have a complete eye examination at least once between ages 3 and 5. Special techniques are needed to measure visual acuity in a child who is too young to speak. Most eye care professionals can perform these techniques.";
    preg_split($ptn, $str, -1, PREG_SPLIT_NO_EMPTY);
    print_r($result);
    ?>

结果:我的字符串中的句子小于600字符

 Array
(
[0] => childhood.
[1] => brain.
[2] => eye.
[3] => amblyopia.
[4] => condition.
[5] => strabismus.
[6] => amblyopia.
[7] => corrected.
[8] => glasses.
[9] => eye.
[10] => amblyopia.
[11] => it.
[12] => lenses.
[13] => scratch-resistant.
[14] => perception.
[15] => problems.
[16] => expected.
[17] => loss.
[18] => 5.
[19] => speak.
[20] => techniques
)