正则表达å¼åŒ¹é…特定字符串之åŽçš„所有行

时间:2016-10-14 10:59:52

标签: regex

Regex - find all lines after a match:å¯èƒ½é‡å¤ï¼Œä½†æˆ‘的需求略有ä¸åŒã€‚

我想解æžä¸€ä¸ªçº¯æ–‡æœ¬æ–‡ä»¶ï¼Œå…¶ä¸­åŒ…å«ç”±ç‰¹å®šå­—符串分隔的多个日期/值数æ®ã€‚我想跳过文件的å‰åŠéƒ¨åˆ†ï¼Œç›´åˆ°æˆ‘想è¦åŒ¹é…结果的特定行。

以下是相关文件的示例(包括表格和空格的混乱):

   I dont want to capture the following measures. This  text is     on a single line and        contains tabs and spaces    is also ends with this token : Token1
05/01/1969         0.01846  
15/01/1969         0.16730  
25/01/1969         0.33988  
05/04/1969         0.81319  
15/04/1969         0.76973  
25/11/2011             0.24210
05/12/2011             0.25220
15/12/2011             0.31160
25/12/2011             0.36845
            End :  bla bla bla
   This text        is also on a single line        and marks the beginning of a new series of      results. These are the results that I want. it also ends with the following         token : Token2
05/01/1969       109.46333  
15/01/1969       110.06998       118.18000
25/01/1969       110.82954  
05/02/1969       111.51394       118.83000
25/02/1969       112.36483  
05/10/2011       114.38798       114.31000
05/10/2011           114.31000       114.38798       114.38798       114.38798       114.38798       114.38798       114.38798
25/12/2011           112.64000       112.41261       112.86301       113.25494       114.06421       115.93219       116.38780
05/01/2012               112.22834       112.92301       113.40561       114.78823       116.62931       117.43421
05/09/2012               110.01410       112.16391       112.88199       115.23640       117.04756       118.04632
15/09/2012               109.97572       112.00809       112.70266       114.91247       116.65256       117.57412
25/09/2012               109.93967       111.87272       112.53305       114.60381       116.26935       117.12756 
            End :  Marks the    end of          the      file

我希望åšçš„是匹é…åŽçš„æ¯ä¸€è¡Œä»¥Token2结尾的行。我å°è¯•è¿‡å…¶ä»–类似问题的ä¸åŒè§£å†³æ–¹æ¡ˆä½†æ²¡æœ‰æ•ˆæžœã€‚我最终匹é…文件的所有结果,并在应用以下模å¼ä¹‹å‰è€ƒè™‘拆分它。有没有纯粹的正则表达å¼è§£å†³æ–¹æ¡ˆå‘¢ï¼Ÿ

这是适用于整个文件的模å¼ã€‚使用命åçš„æ•èŽ·ç»„:

(?P<date>\d\d\/\d\d\/\d\d\d\d)\s*(?P<simul>\d+\.*\d*)[\t ]*(?P<observ>\d+\.*\d*){0,1}[\t ]*(?P<prev_no_rain>\d+\.*\d*){0,1}[\t ]*(?P<prev_10_dry>\d+\.*\d*){0,1}[\t ]*(?P<prev_20_dry>\d+\.*\d*){0,1}[\t ]*(?P<prev_50>\d+\.*\d*){0,1}[\t ]*(?P<prev_20_wet>\d+\.*\d*){0,1}[\t ]*(?P<prev_10_wet>\d+\.*\d*){0,1}

Regex101链接:https://regex101.com/r/a0mCZ2/3

1 个答案:

答案 0 :(得分:2)

您å¯ä»¥ä½¿ç”¨åŒ¹é…字符串开头的\Gè¿ç®—符(å¯ä»¥ä½¿ç”¨è´Ÿé¢å¤–观排除)和上一个æˆåŠŸåŒ¹é…ä½ç½®çš„结尾。使用(?:\G(?!\A)|\bToken2[\r\n]+),我们å¯ä»¥å‘Šè¯‰æ­£åˆ™è¡¨è¾¾å¼å¼•æ“Žåœ¨è¡Œå°¾æ‰¾åˆ°ä¸€ä¸ªå®Œæ•´çš„å•è¯Token2(带有æ¢è¡Œç¬¦å·ï¼‰ï¼Œç„¶åŽåªæœ‰å½“它们紧éšå…¶åŽæ‰ä¼šæ‰¾åˆ°ä»¥ä¸‹å­æ¨¡å¼ã€‚ / p>

å¯ä»¥ä½¿ç”¨çš„正则表达å¼ï¼š

(?:\G(?!\A)[\r\n]*|Token2[\r\n]+)\K(?P<date>\d\d\/\d\d\/\d{4})\s*(?P<simul>\d+\.*\d*)[\t ]*(?P<observ>\d+\.*\d*)?[\t ]*(?P<prev_no_rain>\d+(?:\.\d+)*)?[\t ]*(?P<prev_10_dry>\d+\.*\d*)?[\t ]*(?P<prev_20_dry>\d+\.*\d*)?[\t ]*(?P<prev_50>\d+\.*\d*)?[\t ]*(?P<prev_20_wet>\d+\.*\d*)?[\t ]*(?P<prev_10_wet>\d+\.*\d*)?

请å‚阅regex demo。注æ„我将{0,1}替æ¢ä¸º?以缩短它。

您感兴趣的部分是(?:\G(?!\A)[\r\n]*|Token2[\r\n]+)\K。

  • (?:\G(?!\A)[\r\n]*|Token2[\r\n]+) - 两ç§é€‰æ‹©ä¸­çš„一ç§ï¼š
    • \G(?!\A)[\r\n]* - 上一次æˆåŠŸæ¯”赛结æŸå’Œ0+æ¢è¡Œç¬¦å·
    • | - 或
    • Token2[\r\n]+ - Token2åŽè·Ÿ1 + CR或LF。 (如果您需è¦å°†Token2作为整个è¯åŒ¹é…,则å¯ä»¥åœ¨å…¶å‰é¢æ·»åŠ \b。
  • \K - çœç•¥åˆ°ç›®å‰ä¸ºæ­¢åŒ¹é…的文字。

(?P<date>\d\d\/\d\d\/\d{4})\s*(?P<simul>\d+\.*\d*)[\t ]*(?P<observ>\d+\.*\d*)?[\t ]*(?P<prev_no_rain>\d+(?:\.\d+)*)?[\t ]*(?P<prev_10_dry>\d+\.*\d*)?[\t ]*(?P<prev_20_dry>\d+\.*\d*)?[\t ]*(?P<prev_50>\d+\.*\d*)?[\t ]*(?P<prev_20_wet>\d+\.*\d*)?[\t ]*(?P<prev_10_wet>\d+\.*\d*)?是你的模å¼ï¼Œæˆ‘没有修改太多,并且匹é…具有特定fata的行(请注æ„,它与行匹é…的事实è¯æ˜Ž[\r\n]*在{{1}之åŽçš„使用是正确的}})。