在php(正则表达式)中提取纯文本xhtml数据?

时间:2016-07-16 14:31:59

标签: php regex parsing

我正在使用zapier自动记住牛奶任务,如果原子饲料发生任何变化,它会触发。问题是zapier使用php://input

以纯文本形式发送xhtml格式的数据
<?php
$xhtml = file_get_contents('php://input');
?>

原始数据如下所示:

@class: rtm_due
span: [{'#text': 'Due:', '@class': 'rtm_due_title'}, {'#text': 'Sat 16 Jul 16', '@class': 'rtm_due_value'}]

@class: rtm_priority
span: [{'#text': 'Priority:', '@class': 'rtm_priority_title'}, {'#text': '1', '@class': 'rtm_priority_value'}]

@class: rtm_tags
span: [{'#text': 'Tags:', '@class': 'rtm_tags_title'}, {'#text': 'gcal-work, github', '@class': 'rtm_tags_value'}]

@class: rtm_location
span: [{'#text': 'Location:', '@class': 'rtm_location_title'}, {'#text': 'none', '@class': 'rtm_location_value'}]

@class: rtm_list
span: [{'#text': 'List:', '@class': 'rtm_list_title'}, {'#text': 'Work', '@class': 'rtm_list_value'}]

假设我想在@class:rtm_due下提取截止日期星期六16 7月16日;我怎样才能提取这个?正则表达式(preg_match)会有任何帮助吗?如果是这样的话?

2 个答案:

答案 0 :(得分:1)

也许你可能想要在一个扭曲的&amp;复杂的方式(即在使用RegexLooping构造的函数中获取所需的数据)。请考虑以下功能。值得注意的是,虽然它可能看起来扭曲和令人费解,您不仅限于获取日期值。这意味着您还可以访问该文件中的所有键值对:如果您需要在某个时刻......

    <?php
        $file   = __DIR__ . "/file.txt";   //<== THE NAME OF THE FILE CONTAINING YOUR DATA


        /*************** BEGIN FUNCTIONS ***************/
        function parseFile($file){
            $arrFileContent    = [];

            // IF THE FILE DOES NOT EXIST RETURN NULL 
            if(!file_exists($file)){
                return null;
            }
            // GET THE DATA FROM THE FILE & STORE IT IN A VARIABLE
            $strFileDataContent = file_get_contents($file);

            // IF THE FILE CONTAINS NOTHING RETURN NULL AS WELL  
            if(empty($strFileDataContent)){
                return null;
            }

            // SPLIT THE CONTENTS OF THE FILE (STRING) AT THE END OF EACH LINE
            // THUS CREATING AN ARRAY OF LINES OF TEXT-DATA
            $arrFileDataLines   = explode("\n", $strFileDataContent);

            // LOOP THROUGH THE ARRAY PRODUCED ABOVE & PERFORM SOME PATTERN MATCHING
            // AND TEXT EXTRACTION WITHIN THE LOOP

            foreach($arrFileDataLines as $iKey=>$lineData){
                $arrSubLines   = explode("\n", $lineData);

                foreach($arrSubLines as $intKey=>$strKeyInfo){
                    $rxClass    = "#(^@class:)(\s*)(.*$)#i";
                    $rxSpan     = "#(^span:)(\s*)?(.+$)#si";

                    preg_match($rxClass, $strKeyInfo, $matches);
                    preg_match($rxSpan,  $strKeyInfo, $matches2);

                    if($matches) {
                        list(, $key, $null, $val) = $matches;
                        $keyA   = str_replace("rtm_", "", $val);
                        if (!array_key_exists($keyA, $arrFileContent)) {
                            $arrFileContent[$keyA] = $val;
                        }
                    }
                    if($matches2) {
                        list(, $key2, $null, $val2) = $matches2;
                        $keyB   = $keyA ."Data";
                        if (!array_key_exists($keyB, $arrFileContent)) {
                            $arrFileContent[$keyB] = parseSpanValues($val2, str_replace("rtm_", "", $keyA));
                        }
                    }
                }
            }
            return $arrFileContent;
        }

        function parseSpanValues($spanData, $prefix){
            $arrSpanData    = explode(", ",  preg_replace("#[\{\}\[\]\"\'\#\@]#", "", $spanData));
            $objSpanData    = new stdClass();
            $cleanVal       = "";

            if($prefix == "tags"){
                $cnt = 0;
                foreach($arrSpanData as $i=>$val){
                    if(!stristr($val, ":")){
                        $cleanVal  .= ", " . $val ;
                        $cnt++;
                    }
                }
                $arrSpanData[2] = $arrSpanData[2] . $cleanVal;
                array_splice($arrSpanData, 3, $cnt);
            }

            foreach($arrSpanData as $iKey=>&$spanVal){
                $arrSplit   = preg_split("#\:\s#", $cleanVal . $spanVal);
                $key        = "text";

                if($iKey == 0){
                    $key    = "{$prefix}Text";
                }else if($iKey == 1){
                    $key    = "{$prefix}TextClass";
                }else if($iKey == 2){
                    $key    = "{$prefix}Value";
                }else if($iKey == 3){
                    $key    = "{$prefix}ValueClass";
                }
                if(isset($arrSplit[1])){
                    $objSpanData->$key  = $arrSplit[1];
                }
            }
            return $objSpanData;
        }
        /*************** END OF FUNCTIONS ***************/



        var_dump(parseFile($file));
        // PRODUCES SOMETHING LIKE: 
        array (size=10)
          'due' => string 'rtm_due' (length=7)
          'dueData' => 
            object(stdClass)[1]
              public 'dueText' => string 'Due' (length=3)
              public 'dueTextClass' => string 'rtm_due_title' (length=13)
              public 'dueValue' => string 'Sat 16 Jul 16' (length=13)
              public 'dueValueClass' => string 'rtm_due_value' (length=13)
          'priority' => string 'rtm_priority' (length=12)
          'priorityData' => 
            object(stdClass)[2]
              public 'priorityText' => string 'Priority' (length=8)
              public 'priorityTextClass' => string 'rtm_priority_title' (length=18)
              public 'priorityValue' => string '1' (length=1)
              public 'priorityValueClass' => string 'rtm_priority_value' (length=18)
          'tags' => string 'rtm_tags' (length=8)
          'tagsData' => 
            object(stdClass)[3]
              public 'tagsText' => string 'Tags' (length=4)
              public 'tagsTextClass' => string 'rtm_tags_title' (length=14)
              public 'tagsValue' => string 'gcal-work, github, stack-overflow' (length=33)
              public 'text' => string 'rtm_tags_value' (length=14)
          'location' => string 'rtm_location' (length=12)
          'locationData' => 
            object(stdClass)[4]
              public 'locationText' => string 'Location' (length=8)
              public 'locationTextClass' => string 'rtm_location_title' (length=18)
              public 'locationValue' => string 'none' (length=4)
              public 'locationValueClass' => string 'rtm_location_value' (length=18)
          'list' => string 'rtm_list' (length=8)
          'listData' => 
            object(stdClass)[5]
              public 'listText' => string 'List' (length=4)
              public 'listTextClass' => string 'rtm_list_title' (length=14)
              public 'listValue' => string 'Work' (length=4)
              public 'listValueClass' => string 'rtm_list_value' (length=14)

就像现在一样,如果你想在数组[Element dueData ]中获取第一个实例的日期,你可以简单地做这样的事情:

    <? php
        $data          = parseFile($file);  
        $dateDateValue = $data['dueData']->dueValue;        

        var_dump($dateDateValue);  // PRODUCES:: 'Sat 16 Jul 16'

希望这种尝试(根本没有)给你一个关于如何自己即兴创作的模糊概念。

干杯&amp;祝你好运!!!

答案 1 :(得分:0)

我认为这个正则表达式会起作用:

@class:\s*rtm_due\nspan:\s*\[{.*}, {'#text':\s*(.*),\s*'@class':\s*'rtm_due_value'}]

演示here 但仅适用于due date

如果您需要location,则需要将正则表达式添加到:

@class:\s*rtm_location\nspan:\s*\[{.*}, {'#text':(.*),\s*'@class':\s*'rtm_location_value'}]

组1应该给出所需的值。

这是我在网上提供的php正则表达式测试人员one中得到的输出:

    [0] => @class: rtm_due
span: [{'#text': 'Due:', '@class': 'rtm_due_title'}, {'#text': 'Sat 16 Jul 16', '@class': 'rtm_due_value'}]
    [1] => 'Sat 16 Jul 16'