PHP正则表达式:提取字段

时间:2017-11-24 15:11:02

标签: php regex

此文字为4行, 5列

   Compliance: 7-Day RN Waiver Indicator             1    443   443  VARCHAR2
   Related Provider Number                           10   686   695  CHAR
   Services: Speech Pathology Off-Site Residents     1    834   834  VARCHAR2
   Staff Count: Food Service Worker - Contract       25   1022  1029 NUMBER

提取第1,2,5列的正则表达式是什么?像:

Compliance: 7-Day RN Waiver Indicator|1|VARCHAR2
Related Provider Number|10|CHAR
Services: Speech Pathology Off-Site Residents|1|VARCHAR2
Staff Count: Food Service Worker - Contract|25|NUMBER

这是我的工作正则表达式\s{4}([\w\s]*)https://regex101.com/r/uQxRzA/1/

更新

唯一可以帮助的假设是第1列没有2个或更多空格的名称。

4 个答案:

答案 0 :(得分:0)

您需要首先对线进行标准化,然后才能分割超过2个空格。

$string = 'Compliance: 7-Day RN Waiver Indicator             1    443   443  VARCHAR2
   Related Provider Number                           10   686   695  CHAR
   Services: Speech Pathology Off-Site Residents     1    834   834  VARCHAR2
   Staff Count: Food Service Worker - Contract       25   1022  1029 NUMBER';
$bits = explode(PHP_EOL, $string);
foreach($bits as $bit) {
    print_r(preg_split('/\h{2,}/', trim($bit)));
}

演示:https://3v4l.org/uIpq2

或在您的情况下更改

print_r(preg_split('/\h{2,}/', trim($bit))); 

$columns = preg_split('/\h{2,}/', trim($bit));

然后$columns[0]是第1列,$columns[1]是第2列,$columns[4]是第5列。

答案 1 :(得分:0)

<?php
$input = <<<INPUT
   Compliance: 7-Day RN Waiver Indicator             1    443   443  VARCHAR2
   Related Provider Number                           10   686   695  CHAR
   Services: Speech Pathology Off-Site Residents     1    834   834  VARCHAR2
   Staff Count: Food Service Worker - Contract       25   1022  1029 NUMBER
INPUT;

preg_match_all("/(.*?)([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)(\n|$)/", $input, $m);

print_r($m);

/* output:
Array
(
    [0] => Array
        (
            [0] =>    Compliance: 7-Day RN Waiver Indicator             1    443   443  VARCHAR2

            [1] =>    Related Provider Number                           10   686   695  CHAR

            [2] =>    Services: Speech Pathology Off-Site Residents     1    834   834  VARCHAR2

            [3] =>    Staff Count: Food Service Worker - Contract       25   1022  1029 NUMBER
        )

    [1] => Array
        (
            [0] =>    Compliance: 7-Day RN Waiver Indicator             
            [1] =>    Related Provider Number                           
            [2] =>    Services: Speech Pathology Off-Site Residents     
            [3] =>    Staff Count: Food Service Worker - Contract       
        )

    [2] => Array
        (
            [0] => 1
            [1] => 10
            [2] => 1
            [3] => 25
        )

    [3] => Array
        (
            [0] => 443
            [1] => 686
            [2] => 834
            [3] => 1022
        )

    [4] => Array
        (
            [0] => 443
            [1] => 695
            [2] => 834
            [3] => 1029
        )

    [5] => Array
        (
            [0] => VARCHAR2
            [1] => CHAR
            [2] => VARCHAR2
            [3] => NUMBER
        )

    [6] => Array
        (
            [0] => 

            [1] => 

            [2] => 

            [3] => 
        )

)
*/

答案 2 :(得分:0)

  

提取第1,2,5列

使用preg_splitpreg_match函数:

$text = 'Compliance: 7-Day RN Waiver Indicator             1    443   443  VARCHAR2
   Related Provider Number                           10   686   695  CHAR
   Services: Speech Pathology Off-Site Residents     1    834   834  VARCHAR2
   Staff Count: Food Service Worker - Contract       25   1022  1029 NUMBER';

$lines = preg_split('/\s*\n\s*/', $text);

foreach ($lines as $line) {
    preg_match('/^(.+\S+)\s+(\S+)\s+\S+\s+\S+\s+(\S+)$/', $line, $m);
    array_shift($m);
    echo implode('|', $m) . PHP_EOL;
}

输出:

Compliance: 7-Day RN Waiver Indicator|1|VARCHAR2
Related Provider Number|10|CHAR
Services: Speech Pathology Off-Site Residents|1|VARCHAR2
Staff Count: Food Service Worker - Contract|25|NUMBER

答案 3 :(得分:0)

代码

See regex in use here

^\h{2,}((?:(?!\h{2})[\s\S])*)\h*(\S+)(?:\h*\S+){2}\h*(\S+)

替换

$1|$2|$3

说明

  • ^在行首处断言位置
  • \h{2,}匹配2个或更多水平空白字符
  • ((?:(?!\h{2})[\s\S])*)将以下内容捕获到捕获组1中。
  • \h*任意数量的水平空白字符
  • (\S+)将一个或多个非空白字符捕获到捕获组2
  • (?:\h*\S+){2}完全匹配以下两次
    • \h*匹配任意数量的水平空白字符
    • \S+匹配一个或多个非空白字符
  • \h*匹配任意数量的水平空白字符
  • (\S+)将一个或多个非空白字符捕获到捕获组3