使用PHP从文本文件中提取特定信息

时间:2013-10-19 07:57:07

标签: php arrays text extract

问题:

我想使用PHP从文本文件中提取特定信息,并为信息创建两个数组。

示例文字文件:

enter image description here

期望的输出:

  1. / Perspective的关键字应该在第一个数组:skincover,intelligence and legs
  2. 列关键字应位于第二个数组:leg,skincover,weight,intelligence,speed
  3. 以/开头的列关键字应该被忽略

    到目前为止

    代码:

    <?php   
        $file = file('Creatures_rich.txt');
    
        foreach($file as $line_num => $line)
        {
            if (eregi("^/Perspective", $line))
                 $perspective = explode(' ', trim(str_replace('/Perspective:', '', $line)));
        }
    
        echo "<xmp>".print_r($perspective, true)."</xmp>";
    ?>
    

    到目前为止的输出:

    Array
    (
        [0] => skincover
        [1] => intelligence
        [2] => legs
    )
    

    我如何开始处理第二个阵列?我们欢迎任何想法,并欢迎任何代码示例。

1 个答案:

答案 0 :(得分:1)

解决方案

假设只有一行标记为/Perspective:,并且开头的第一行\t是列标题行...

评论过多(为清晰起见)

$perspectives = array();              //Initialise perspectives array
$columns      = array();              //Initialise column names array
$text_file    = fopen('./file', 'r'); //Open file to handle

while($line = fgets($text_file)){                        //Read file line by line
    if(strpos($line, '/Perspective:') === 0){            //Check if '/Perspective:' is at start of string
        $perspectives = explode(' ', substr($line, 14)); // Remove first 14 characters: /Perspective: 
        continue;
    }
    else if(strpos($line, "\t") === 0){ //Check if first char in line is \t
        $columns = explode("\t", 
                            preg_replace("#\t/.+#", '', substr($line, 1)) //Remove commented column names and first \t
                          );
        break; // Break while loop after column names row
    }
 }

未注释的代码

$perspectives = array();
$columns      = array();
$text_file    = fopen('./file', 'r');
while($line = fgets($text_file)){
    if(strpos($line, '/Perspective:') === 0){
        $perspectives = explode(' ', substr($line, 14));
        continue;
    }
    elseif(strpos($line, "\t") === 0){
        $columns = explode("\t", 
                            preg_replace("#\t/.+#", '', substr($line, 1))
                          );
        break;
    }
 }

输入文件

/Purpose: Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nam, suscipit incidunt doloribus voluptatum dicta maxime accusantium animi eum vero eaque odit quae non quaerat possimus enim ad numquam consequuntur beatae.
/Origin: Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, animi minus perspiciatis laudantium? Nostrum, aspernatur, sequi ratione assumenda fuga similique architecto deleniti sint recusandae voluptatibus numquam obcaecati ducimus eaque nisi.
/Rawdata: Unknown
/Perspective: skincover intelligence legs
/Lorem ipsum dolor sit amet, consectetur adipisicing elit. Porro, libero, accusamus laboriosam modi voluptatem facere quod unde atque perferendis laborum nisi omnis nihil cum minima quaerat. Quia, quaerat ipsa molestiae.
    legs    skincover   weight  intelligence    speed   /something  /else
dog 1   1   1   1   1   1   1
pig 1   1   1   1   1   1   1
human   1   1   1   1   1   1   1

旁注

我出于好奇 - 你的代码(代码A )违反我的代码(代码B )以查看效果更好。

结果

执行时间:

Code A: 0.000108
Code B: 0.000044

代码B 的速度提高2.4545454545倍,整个操作perspecitvescolumn names

在分析两个代码太多时,我建议区别的主要原因是我们处理文件的方式。

N.B。

我确实多次进行了比较,差异大致从2.2x2.7x

此外,时间都非常小所以它不是什么大不了的事情......