正则表达式 - 匹配和提取复杂条件

时间:2013-06-20 21:37:47

标签: regex

我正在尝试编写一个符合这些条件的正则表达式:

  • 最多8000个字符(任何字符,包括“\ r \ n”)
  • 最多10行(以\ r \ n分隔)。
  • 从匹配的文字中仅提取前4行

无法找到一个好方法......:/

谢谢!

2 个答案:

答案 0 :(得分:1)

正则表达式不是您需要的。它们用于匹配某个模式,而不是一定的长度。如果您将数据保存在string中,则myString.length <= 8000只需要字符数(当然,使用正确的语言语法)。对于行数,您必须计算字符串中\r\n个序列的数量(可以迭代完成)。要获得前四行,只需找到第4行\r\n,然后使用substring方法获取所有内容。

答案 1 :(得分:1)

描述

此表达式执行以下操作:

  • 验证输入字符串是否介于零和8,000个字符之间
  • 验证最多有10行新行分隔文字
  • 然后捕获前4行的新行分隔文本

\A(?=.{0,8000}\Z)(?=(?:^.*?(?:\r|\n|\Z)){0,10}\Z)(?:^.*?[\r\n\Z]+){0,4}这需要选项:m多行,s点匹配所有字符

enter image description here

扩展

  • \A锚定到字符串的开头,此锚点允许使用s选项,该选项允许.匹配新的换行符和换行符
  • (?=.{0,8000}\Z)向前看并验证零到8000个字符
  • (?=(?:^.*?(?:\r|\n|\Z)){0,10}\Z)展望未来并确认不会有超过10个新的行分隔行
  • (?:^.*?[\r\n\Z]+){0,4}匹配前4行文字

PHP代码示例:

您没有指定语言,因此我将包含此PHP示例以显示其工作原理和示例输出。

输入文字

此输入测试是8行新行分隔的字符串。这里只有1779个字符。

Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean. A small 
river named Duden flows by their place and supplies it with the necessary regelialia. It is a paradisematic country, in which roasted parts of sentences fly into your mouth. Even the all-powerful Pointing has no control about 
the blind texts it is an almost unorthographic life One day however a small line of blind text by the name of Lorem Ipsum decided to leave for the far World of Grammar. The Big Oxmox advised her not to do so, because there were 
thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen. She packed her seven versalia, put her initial into the belt and made herself on the way. When she reached the first hills of 
the Italic Mountains, she had a last view back on the skyline of her hometown Bookmarksgrove, the headline of Alphabet Village and the subline of her own road, the Line Lane. Pityful a rethoric question ran over her cheek, then 
she continued her way. On her way she met a copy. The copy warned the Little Blind Text, that where it came from it would have been rewritten a thousand times and everything that was left from its origin would be the word "and" 
and the Little Blind Text should turn around and return to its own, safe country. But nothing the copy said could convince her and so it didn’t take long until a few insidious Copy Writers ambushed her, made her drunk with Longe 
and Parole and dragged her into their agency, where they abused her for their projects again and again. And if she hasn’t been rewritten, then they are still using her.

<强>代码

<?php
$sourcestring="your source string";
preg_match('/\A(?=.{0,8000}\Z)(?=(?:^.*?(?:\r|\n|\Z)){0,10}\Z)(?:^.*?[\r|\n\Z]+){0,4}/ims',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

<强>匹配

$matches Array:
(
    [0] => Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean. A small 
river named Duden flows by their place and supplies it with the necessary regelialia. It is a paradisematic country, in which roasted parts of sentences fly into your mouth. Even the all-powerful Pointing has no control about 
the blind texts it is an almost unorthographic life One day however a small line of blind text by the name of Lorem Ipsum decided to leave for the far World of Grammar. The Big Oxmox advised her not to do so, because there were 
thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen. She packed her seven versalia, put her initial into the belt and made herself on the way. When she reached the first hills of 

)