我有以下条件:
if(in_array($needle, $haystack) ||
in_array($needle . "somePostfix", $haystack) ||
in_array($needle . "someOtherPostfix", $haystack) ||
// and some more) {
// do something
}
我的草堆包含超过10k个元素,此检查大约需要400毫秒。我知道in_array
必须多次迭代整个数组。在我的情况下,常见的情况是找不到元素。我尝试通过创建以下方法来改进这一点,该方法只在haystack上迭代一次:
function wildcardInArray($needle, $haystack) {
foreach ($haystack as $value) {
if (true === fnmatch($needle . '*', $haystack)) {
return true;
}
}
return false;
}
但这会降低我的表现,在我看来fnmatch
是瓶颈。
这种阵列搜索的情况有什么改进吗?
答案 0 :(得分:3)
您可以将数组用作'键,即
$arr = ['a', 'b', 'c', … ];
至$arr = ['a' => true, 'b' => true, …]
。
您将消耗更多内存,但您将获得isset($arr[$key]);
的即时结果。
内存中速度最快但最大,您可以使用stdClass和isset($obj->$key);
$obj = new stdClass();
$obj->{'a'} = true;
$obj->{'b'} = true;
$obj->{'…'} = true;
如果您无法更改阵列结构,请告诉我们您是否可以手动对阵列内容进行排序?
// generic
$length = strlen($needle);
$char = $needle[0];
$found = false;
$suffixes = [ false, 'somePostfix', 'someOtherPostfix' ];
foreach($haystack as $entry) {
if ($char === $entry[0] && $needle === substr($entry, 0, $length)) {
$suffix = substr($entry, $length);
if (in_array($suffix, $suffixes, true)) {
$found = true;
break;
}
}
}
答案 1 :(得分:2)
这是一个非常有趣的问题,似乎没有一个很好的答案。我做了一些非常不科学的基准测试,对于一个有100000个元素的in_array
,我无法获得比$haystack
更快的速度。
PHP 5.5.9-1ubuntu4.14 (cli) (built: Oct 28 2015 01:34:46)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies
with Xdebug v2.2.3, Copyright (c) 2002-2013, by Derick Rethans
Sorting Time*: 0.19367408752441
Imploding Time**: 0.0207359790802
preg_match: 0.10927486419678
needle ===: 0.083639144897461
in_array: 0.019428968429565
array_flip: 0.028955936431885
array_intersect: 0.15198707580566
array_diff: 0.15532493591309
//*sort without search (binary search wouldn't add much time)
//**time it took to implode the array
// (no search was performed, this search WOULD take significant time if implemented)
正如您所看到的,这些方法中只有三种花费的时间不到100毫秒,needle ===
,in_array
和array_flip
。在这三个中,in_array
显然是最快的。现在的问题是你有多少postfix-es? in_array
上的运行时间为O(n*m)
(n
是大海捞针的大小,m
是后缀的数量),如果m
则会出现问题也很大。如果m
非常大,则对数据进行排序一次并对排序列表执行二进制搜索将是O(m*log(n))
,其增长速度要慢得多,但初始开销较高,如上面的排序时间所示。更好的是,如果您有一个非常大的m
可能是array_flip
,因为每次搜索都应该在初始翻转后进行O(1)
查找。
干草堆创作
$haystack = array();
function getRandomWord($len = 10) {
$len = rand(3,10);
$word = array_merge(range('a', 'z'), range('A', 'Z'));
shuffle($word);
return substr(implode($word), 0, $len);
}
$numWords = 100000;
for($i = 0; $i < $numWords; $i++) {
$haystack[] = getRandomWord();
}
测试
//*Sorting*
$copy = $haystack;
sort($copy);
//implode
$copy = implode($haystack, " ");
//*preg_match_test*
function preg_match_test($regex, $haystack) {
$matches = false;
foreach($haystack as $value) {
if (preg_match($regex, $value)) {
$matches = true;
break;
}
}
return $matches;
}
//needle ===
function equalsNeedle($needles, $haystack) {
$matches = false;
foreach ($haystack as $value) {
foreach($needles as $needle) {
if ($needle === $value) {
$matches = true;
break 2;
}
}
}
return $matches;
}
//in_array
function baseCase($needles, $haystack) {
$matches = false;
foreach($needles as $needle) {
if (in_array($needle, $haystack)) {
$matches = true;
break;
}
}
return $matches;
}
//array_flip
function arrayFlipping($needles, $haystack) {
$matches = false;
$copy = array_flip($haystack);
foreach ($needles as $needle) {
if (array_key_exists($needle, $copy)) {
$matches = true;
}
}
return $matches;
}
//array_intersect
function arrayIntersect($needles, $haystack) {
if (count(array_intersect($needles, $haystack)) > 0) {
return true;
}
return false;
}
//array_diff
function arrayDiff($needles, $haystack) {
if (count(array_diff($needles, $haystack)) !== count($needles)) {
return true;
}
return false;
}
致电代码
$array = array("foo","foobar","foobazz","foobuzz");
$base = "foo";
$regex = "/^$base(bizz|bazz|buzz|)$/";
echo "preg_match: ";
preg_match_test($regex, $haystack);
echo "needle === ";
equalsNeedle($array, $haystack);
echo "in_array: ";
baseCase($array, $haystack);
echo "array_flip: ";
arrayFlipping($array, $haystack);
echo "array_intersect: ";
arrayIntersect($array, $haystack);
echo "array_diff: ";
arrayDiff($array, $haystack);
所有测试都使用microtime(true)
包装时序代码。