Question

我有一个1.2GB的文件，其中包含一行字符串。我需要的是搜索整个文件以找到另一个字符串的位置（目前我有一个要搜索的字符串列表）。我现在正在做的方法是打开大文件并将指针移动到4Kb块，然后将指针X位置移回文件中并再获得4Kb。

我的问题是要搜索更大的字符串，他需要更长的时间才能获得它。

你能给我一些想法来优化脚本以获得更好的搜索时间吗？

这是我的实施：

function busca($inici){
        $limit = 4096;

        $big_one    = fopen('big_one.txt','r');
        $options    = fopen('options.txt','r');

        while(!feof($options)){
            $search = trim(fgets($options));
            $retro  = strlen($search);//maybe setting this position absolute? (like 12 or 15)

            $punter = 0;
            while(!feof($big_one)){
                $ara = fgets($big_one,$limit);

                $pos = strpos($ara,$search);
                $ok_pos = $pos + $punter;

                if($pos !== false){
                    echo "$pos - $punter - $search : $ok_pos <br>";
                    break;
                }

                $punter += $limit - $retro;
                fseek($big_one,$punter);
            }
            fseek($big_one,0);
        }
    }

提前致谢！

Answer 1

为什么不使用exec + grep -b？

exec('grep "new" ext-all-debug.js -b', $result);
// here we have looked for "new" substring entries in the extjs debug src file
var_dump($result);

样本结果：

array(1142) {
    [0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"
    [1]=>  string(54) "3910:var tpl = new Ext.DomHelper.createTemplate(html);"
    ...
}

每个项目包括从文件开头到字符串的字符串偏移量和行本身，用冒号分隔因此，在此之后，您必须查看特定行内部并将该位置附加到行偏移。即：

[0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"

这意味着在第3408个字节处发现“新”事件（3398是行位置，10是此行内“新”的位置）

Answer 2

$big_one    = fopen('big_one.txt','r');
$options    = fopen('options.txt','r');  

while(!feof($options))
{
  $option = trim(fgets($options));
  $position = substr($big_one,$option);

  if($position)
    return $position; //exit loop
}

虽然文件的大小非常大。您可能需要考虑将数据存储在数据库中。或者如果你绝对不能，那么使用这里发布的grep解决方案。

PHP中的速度字符串搜索

2 个答案: