正则表达式/解析从双引号中挖掘输出/提取文本

时间:2009-06-24 21:52:33

标签: php regex quotes

我需要帮助找出一些正则表达式。我正在运行dig命令,我需要使用它的输出。我需要解析它并使用php将它整齐地排列为数组。

挖出这样的东西:

m0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183342" "Some text here1"
m0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183341" "Some text here2"

我想得到这个:

Array
(
    [0] => Array
        (
            [0] => .tkw
            [1] => 1
            [2] => 20090624-183342
            [3] => Some text here1
        )
    [1] => Array
...
)

我只需要双引号内的内容。我可以逐行解析挖掘输出,但我认为如果我只运行所有的正则表达式模式匹配会更快...

思想?

4 个答案:

答案 0 :(得分:2)

我不确定PHP正则表达式,但在Perl中,RE很简单:

my $c = 0;
print <<EOF;
Array
(
EOF
foreach (<STDIN>) {
    if (/[^"]*"([^"]*)"\s+"([^"]*)"\s+"([^"]*)"\s+"([^"]*)"/) {
        print <<EOF;
    [$c] => Array
        (
            [0] = $1
            [1] = $2
            [2] = $3
            [3] = $4
        )
EOF
        $c++;
    }
}

print <<EOF;
)
EOF

这有一些限制,即:

  • 如果引号中的文字可以使用转义引号(例如\"
  • ,则不起作用
  • 硬编码仅支持四个引用值。

答案 1 :(得分:0)

代码:

<?php
    $str = 'm0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183342" "Some text here1"
m0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183341" "Some text here2"';

    header('Content-Type: text/plain');
    $matches = array();
    preg_match_all('/(".*").*(".*").*(".*").*(".*")/U', $str, $matches, PREG_SET_ORDER);
    print_r($matches);
?>

输出:

Array
(
    [0] => Array
        (
            [0] => ".tkw" "1" "20090624-183342" "Some text here1"
            [1] => ".tkw"
            [2] => "1"
            [3] => "20090624-183342"
            [4] => "Some text here1"
        )

    [1] => Array
        (
            [0] => ".tkw" "1" "20090624-183341" "Some text here2"
            [1] => ".tkw"
            [2] => "1"
            [3] => "20090624-183341"
            [4] => "Some text here2"
        )

)

答案 2 :(得分:0)

这一点接近一行

preg_match_all( '/"([^"]+)"\s*"([^"]+)"\s*"([^"]+)"\s*"([^"]+)"/', $text, $matches, PREG_SET_ORDER );

print_r( $matches );

但是,由于preg_match *函数的工作原理,完全匹配包含在每个匹配组的索引0处。如果你真的想要,你可以解决这个问题。

array_walk( $matches, create_function( '&$array', 'array_shift( $array );return $array;' ) );

答案 3 :(得分:0)

完全不是你要求的,但确实有用,可以用于任何引号的字符串,并且具有比普通正则表达式更易读的好处(代价是代码更多)

class GetQuotedText {   
    const STATE_OUTSIDE = 'STATE_OUTSIDE';
    const STATE_INSIDE  = 'STATE_INSIDE';

    static private $input;
    static private $counter;
    static private $state;
    static private $results;

    static private $current;
    static private $full;
    static private $all;

    static private function setInput($string) {
        $this->input = $string;

    }

    static private function init($string) {
        self::$current  = array();
        self::$full         = array();      
        self::$input    = $string;
        self::$state    = self::STATE_OUTSIDE;
    }


    static public function getStrings($string) {
        self::init($string);
        for(self::$counter=0;self::$counter<strlen(self::$input);self::$counter++){
            self::parse(self::$input[self::$counter]);
        }
        self::saveLine();
        return self::$all;
    }

    static private function parse($char) {
        switch($char){
            case '"':
                self::encounteredToken($char);
                break;      
            case "\n":  //deliberate fall through for "\n" and "\r"
            case "\r":
                self::encounteredToken($char);
                break;
            default:
                if(self::$state == self::STATE_INSIDE) {
                    self::action($char);
                }
        }
    }

    static private function encounteredToken($token) {
        switch($token) {
            case '"':
                self::swapState();
                break;
            case "\n":  //deliberate fall through for "\n" and "\r"
            case "\r":
                self::saveArray();
                self::saveLine();
                break;
        }
        return;
    }

    static private function swapState() {
        if(self::$state == self::STATE_OUTSIDE) {
            self::$state = self::STATE_INSIDE;
        }
        else {
            self::$state = self::STATE_OUTSIDE;             
            self::saveArray();
        }               
    }
    static public function saveLine() {
        self::$all[] = self::$full;
        self::$full = array();
        //reset state when line ends
        self::$state = self::STATE_OUTSIDE;
    }

    static private function saveArray() {
        if(count(self::$current) > 0) {
            self::$full[]   = implode ('',self::$current);
            self::$current  = array();
        }
    }

    static private function action($char) {
        self::$current[] = $char;
    }
}

$input = 'm0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183342" "Some text here1"' . "\n" .
         'm0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183341" "Some text here2"';
$strings = GetQuotedText::getStrings($input);
print_r($strings);