算法查找匹配布尔查询的大型int数组的子集

时间:2011-08-25 10:05:49

标签: arrays algorithm optimization boolean-operations

假设我有一个M 32位整数的大数组,其中每个值设置不超过N位。现在我想返回与查询Target AND Value == Target匹配的子集,即目标位出现在数组值中的值。

蛮力很容易,只需迭代数组并提取目标& value == target。如果M变得非常大,这变得太慢了。任何人都知道如何将数组转换为更适合搜索的数据结构?

一种方法是为每个位存储数组或值(因此对于32位数组,您需要其中的32个),然后仅搜索与目标值中的每个位匹配的值。除非N接近32或目标接近N位设置,否则这有点帮助。由于我所寻找的基本上是部分匹配,散列或排序似乎没有帮助。

确切的正确结果是必需的。这必须在不访问并行硬件(如GPU或使用SIMD)的情况下工作。

我将使用C ++,但只是一些指向算法或想法的指针很好。最可能的情况是M = 100000和N = 8并且经常被调用。

重申:我需要部分匹配(例如item = 011000匹配目标= 001000)不是完全匹配。虽然M项是提前知道的,但目标的可能值可以是任何值。

我终于决定坚持使用蛮力。对于80,000件物品,不值得做其他任何事情。我想如果数据集的大小更像是800,000,000,那么它可能是值得的。

5 个答案:

答案 0 :(得分:2)

您可以构建bitwise trie

遍历trie时,对于目标中的每个0,您需要遍历两个分支。

编辑在测试快速实施后,我不推荐这种方法。蛮力方法比这个方法快100倍。

答案 1 :(得分:2)

如何从另一个视点查看此问题?将整数集视为一维图片的集合。组织它们的方法之一是将每张图片分成两部分AB,并按类别对所有图片进行排序:

  1. A仅包含零,B包含一些位(至少一个)
  2. A包含一些位集,B仅包含零
  3. AB包含一些位集(超集1和2)
  4. AB仅包含零
  5. 现在,您将目标/蒙版的相同分割放入相同的部分,并以相同的方式进行分类。之后你可以推断下一个(按目标/面具类别):

    1. 您可以跳过第2类和第4类的图片
    2. 您可以跳过第1类和第4类的图片
    3. 您可以跳过第4类
    4. 中的图片
    5. 所有图片均符合此目标/面具
    6. 在下一个级别,部分AB会再次被拆分(因此您有4个部分),依此类推。

      当然我不希望它能加快速度。但对于某些数据集,其中没有设置那么多位(与基于位的树的变体相反),它可能会更好。

      更新:我在Haskell变体中获得了34%的加速:

          benchmarking burte-force list search
          mean: 14.67350 ms, lb 14.65103 ms, ub 14.71614 ms, ci 0.950
          std dev: 153.6920 us, lb 95.70642 us, ub 246.6497 us, ci 0.950
      
          benchmarking tree-lookup search
          mean: 9.592271 ms, lb 9.564509 ms, ub 9.667668 ms, ci 0.950
          std dev: 216.6084 us, lb 100.3315 us, ub 455.2730 us, ci 0.950
      

      源代码:

      {-# LANGUAGE GeneralizedNewtypeDeriving #-}
      {-# LANGUAGE TypeFamilies #-}
      {-# LANGUAGE FlexibleContexts #-}
      {-# LANGUAGE UndecidableInstances #-}
      
      import Control.Arrow (first)
      import Control.DeepSeq
      import Data.Word
      import Data.Bits
      import Data.List
      
      import Criterion.Main
      import Criterion.Config
      import System.Random
      
      class BitmapsCollection a where
          type BitmapOf a
          bitmapsCollection :: [BitmapOf a] -> a
          findMaskedPattern :: a -> BitmapOf a -> [BitmapOf a]
      
      -- Plain lookup through array
      newtype BitmapsList p = BitmapsList [p]
          deriving (Show, NFData)
      
      instance Bits p => BitmapsCollection (BitmapsList p) where
          type BitmapOf (BitmapsList p) = p
          bitmapsCollection = BitmapsList
          findMaskedPattern (BitmapsList xs) m = filter (\e -> e .&. m == m) xs
      
      -- Tree of bitmap partitions
      data Bits p => BitmapsCoverTree p = EmptyBitmapsCoverNode
                                        | BitmapsCoverNode (p,p) (BitmapsCoverTree p) (BitmapsCoverTree p) [p] [p]
                                        | LeafBitmapsCoverNode [p]
          deriving Show
      
      instance (Bits p, NFData p) => NFData (BitmapsCoverTree p) where
          rnf EmptyBitmapsCoverNode = ()
          rnf (LeafBitmapsCoverNode xs) = rnf xs
          rnf (BitmapsCoverNode mask node1 node2 category3 category4) = mask `deepseq` node1 `deepseq` node2 `deepseq` category3 `deepseq` rnf category4
      
      data BitmapCoverCategory = CoverA | CoverB | CoverAB | CoverZero
      
      coverCategory :: Bits a => (a, a) -> a -> BitmapCoverCategory
      coverCategory (maskA, maskB) x = case (x .&. maskA, x .&. maskB) of
                                           (0, 0) -> CoverZero
                                           (0, _) -> CoverB
                                           (_, 0) -> CoverA
                                           _ -> CoverAB
      
      coverCategorize :: Bits a => (a, a) -> [a] -> ([a], [a], [a], [a])
      coverCategorize mask = walk (id, id, id, id) where
          category = coverCategory mask
          walk (a, b, ab, z) [] = (a [], b [], ab [], z [])
          walk (a, b, ab, z) (x:xs) = case (category x) of
                                          CoverA -> walk (a . (x:), b, ab, z) xs
                                          CoverB -> walk (a, b . (x:), ab, z) xs
                                          CoverAB -> walk (a, b, ab . (x:), z) xs
                                          CoverZero -> walk (a, b, ab, z . (x:)) xs
      
      suffixMask, prefixMask :: Bits a => Int -> a
      suffixMask n = complement 0 `shiftL` n
      prefixMask n = complement (suffixMask n)
      
      rangeMask :: Bits a => (Int, Int) -> a
      rangeMask (n, m) = suffixMask n .&. prefixMask m
      
      instance Bits p => BitmapsCollection (BitmapsCoverTree p) where
          type BitmapOf (BitmapsCoverTree p) = p
          bitmapsCollection bms = buildCoverNode (0, bitSize (head bms)) bms where
              splitBoundary = 4
              buildCoverNode :: Bits a => (Int, Int) -> [a] -> BitmapsCoverTree a
              buildCoverNode _ [] = EmptyBitmapsCoverNode
              buildCoverNode (n, m) xs | (m - n) < splitBoundary = LeafBitmapsCoverNode xs -- too small
              buildCoverNode (n, m) xs = BitmapsCoverNode mask node1 node2 category3 category4 where
                  mm = (n+m) `div` 2
      
                  mask = (rangeMask (n, mm), rangeMask (mm, m))
      
                  (category1, category2, category3, category4) = coverCategorize mask xs
      
                  node1 = buildCoverNode (n, mm) category1
                  node2 = buildCoverNode (mm, m) category2
      
          findMaskedPattern EmptyBitmapsCoverNode _ = []
          findMaskedPattern (LeafBitmapsCoverNode ps) m = filter (\e -> e .&. m == m) ps
      
          findMaskedPattern (BitmapsCoverNode _ node1 node2 category3 category4) 0 = flatten where
              flatten = findMaskedPattern node1 0 ++ findMaskedPattern node2 0 ++ category3 ++ category4
      
          findMaskedPattern (BitmapsCoverNode mask node1 node2 category3 category4) m = result where
              targetCategory = coverCategory mask m
              filterTarget = filter (\p -> p .&. m == m)
              result = case targetCategory of
                           CoverA -> findMaskedPattern node1 m ++ filterTarget category3
                           CoverB -> findMaskedPattern node2 m ++ filterTarget category3
                           CoverAB -> filterTarget category3
                           CoverZero -> category1 ++ category2 ++ category3 ++ category4
      
              category1 = findMaskedPattern node1 0
              category2 = findMaskedPattern node2 0
      
      
      main = do
          gen <- getStdGen
          let size = 1000000
              bitmaps :: [Word32]
              (bitmap, genm) = first fromIntegral (random gen :: (Int, StdGen))
              bitmaps = map fromIntegral (take size (randoms genm) :: [Int])
              bitmapsList = bitmapsCollection bitmaps :: BitmapsList Word32
              bitmapsTree = bitmapsCollection bitmaps :: BitmapsCoverTree Word32
      
          bitmapsList `deepseq` bitmapsTree `deepseq` return ()
      
          defaultMainWith defaultConfig (return ()) [
              bench "burte-force list search" $ nf (findMaskedPattern bitmapsList) bitmap,
              bench "tree-lookup search" $ nf (findMaskedPattern bitmapsTree) bitmap
              ]
      

      更新:C ++ 11代码的种类。它为蛮力提供10.9444 ms,对此算法提供8.69286 ms。我通过使开启比特的分布输出更加稀疏来欺骗。

      #include <iostream>
      #include <vector>
      #include <list>
      #include <random>
      #include <functional>
      #include <cassert>
      #include <memory>
      
      
      #include <sys/time.h>
      #include <sys/resource.h>
      
      // benchmark boiler plate code
      double cputime()
      {
          struct rusage usage;
          int check = getrusage( RUSAGE_SELF, &usage );
          assert(check == 0);
          return (usage.ru_utime.tv_sec + usage.ru_utime.tv_usec*1.0e-6);
          //return (((double)clock())/((double)CLOCKS_PER_SEC));
      }
      
      double measure(std::function<void()> func, size_t iterations)
      {
          double t1, t2;
          size_t i;
          t1 = cputime();
          for(i = 0; i < iterations; ++i) func();
          t2 = cputime();
          return (t2 - t1);
      }
      
      std::pair<std::string, double> human(double value)
      {
          static const std::vector<std::pair<std::string, double>> prefixes = {
              { "pico",  1e-12 },
              { "nano",  1e-9  },
              { "micro", 1e-6  },
              { "milli", 1e-3  },
              { "",      1     },
              { "kilo",  1e3   },
              { "mega",  1e6   },
              { "giga",  1e9   },
              { "tera",  1e12  }
          };
      
          for(auto it = prefixes.begin(); it != prefixes.end(); ++it)
          {
              if (it->second > value) 
              {
                  auto prev = *(--it);
                  return std::pair<std::string, double>(prev.first, value/prev.second);
              }
          }
          auto last = *prefixes.rbegin();
          return std::pair<std::string, double>(last.first, value/last.second);
      }
      
      void bench(std::string name, std::function<void()> func, double bench_seconds = 10)
      {
          const double accurate_seconds = 0.1;
      
          std::cout << "benchmarking " << name << std::endl
                    << "estimating iterations" << std::endl;
          size_t base_iterations = 1;
          double base_seconds = measure(func, base_iterations);
          while(base_seconds < accurate_seconds)
          {
              base_iterations *= 2;
              base_seconds = measure(func, base_iterations);
          }
      
          const size_t iterations = bench_seconds * base_iterations / base_seconds;
          const double estimated_seconds = iterations * base_seconds / base_iterations;
          std::cout << "estimated time " << estimated_seconds << " seconds (" << iterations << " iterations)" << std::endl;
      
          const double seconds = measure(func, iterations);
          const auto ips = human(iterations / seconds);
          const auto spi = human(seconds / iterations);
          std::cout << "benchmark took " << seconds << " seconds" << std::endl
                    << "average speed " << ips.second  << ' ' << ips.first << " iterations per second" << std::endl
                    << "average time " << spi.second << ' ' << spi.first << " seconds per iteration" << std::endl;
      }
      
      // plain brute-force lookup
      template<class iterator>
      std::list<typename iterator::value_type> brute_lookup(const typename iterator::value_type pattern, iterator begin, const iterator &end)
      {
          typedef typename iterator::value_type value_type;
          std::list<value_type> result;
      
          for(;begin != end; ++begin)
          {
              if ((*begin & pattern) == pattern) result.push_back(*begin);
          }
      
          return result;
      }
      
      // tree-traversing lookup
      
      template<class _value_type>
      struct cover_node
      {
          typedef _value_type value_type;
      
          value_type mask_a, mask_b;
          std::auto_ptr<cover_node<value_type>> node_a, node_b;
          std::vector<value_type> category_ab, category_zero;
      };
      
      
      template<class _value_type>
      std::ostream &pprint(std::ostream &s, const std::auto_ptr<cover_node<_value_type>> &node, const std::string indent = "")
      {
          if (!node.get())
          {
              s << indent << "cover_node: (null)" << std::endl;
              return s;
          }
      
          s << indent
            << "cover_node: mask = " << std::hex << node->mask_a << "/" << node->mask_b
            << ", leafs = " << std::dec << node->category_ab.size() << "/" << node->category_zero.size() << std::endl;
      
          const std::string sub = indent + "  ";
          pprint(s, node->node_a, sub);
          return pprint(s, node->node_b, sub);
      }
      
      enum class cover_category { a, b, ab, zero };
      
      template<class vt>
      cover_category
      identify_cover(const vt mask_a, const vt mask_b, const vt x)
      {
          const auto a = (x & mask_a) != 0;
          const auto b = (x & mask_b) != 0;
      
          if (!a)
          {
              if (!b) return cover_category::zero;
              else return cover_category::b;
          }
          else
          {
              if (!b) return cover_category::a;
              else return cover_category::ab;
          }
      }
      
      template<class vt>
      vt bitmask(const size_t n, const size_t m)
      {
          return (~0 << n) & ~(~0 << m);
      }
      
      template<class iterator>
      std::auto_ptr<cover_node<typename iterator::value_type>>
      build_cover_node(size_t n, size_t m, iterator begin, const iterator &end)
      {
          const size_t split_boundary = 4;
      
          typedef typename iterator::value_type value_type;
          std::auto_ptr<cover_node<value_type>> node(new cover_node<value_type>);
      
          if ((m - n) < split_boundary) // too small group
          {
              // overlapped mask for simplification of sub-tree into list
              node->mask_a = ~0;
              node->mask_b = ~0;
              node->category_ab.insert(node->category_ab.end(), begin, end);
              return node;
          }
      
          std::list<value_type> category_a, category_b;
      
          const size_t h = (n + m) / 2;
      
          node->mask_a = bitmask<value_type>(n, h);
          node->mask_b = bitmask<value_type>(h, m);
          auto &category_ab = node->category_ab;
          auto &category_zero = node->category_zero;
      
          // categorize
          for(;begin != end; ++begin)
          {
              switch(identify_cover(node->mask_a, node->mask_b, *begin))
              {
              case cover_category::a:
                  category_a.push_back(*begin);
                  break;
              case cover_category::b:
                  category_b.push_back(*begin);
                  break;
              case cover_category::ab:
                  category_ab.push_back(*begin);
                  break;
              case cover_category::zero:
                  category_zero.push_back(*begin);
                  break;
              }
          }
      
          // build sub-nodes
          if (!category_a.empty()) node->node_a = build_cover_node(n, h, category_a.begin(), category_a.end());
          if (!category_b.empty()) node->node_b = build_cover_node(h, m, category_b.begin(), category_b.end());
      
          return node;
      }
      
      template<class _value_type>
      struct cover_walker
      {
          typedef _value_type value_type;
          typedef cover_node<value_type> node_type;
      
          cover_walker(value_type target_pattern, const node_type &root_node) :
              target(target_pattern)
          { 
              walk(root_node);
          }
      
          const std::list<value_type> &get_result() const
          {
              return result;
          }
      
      private:
          value_type target;
      
          std::list<value_type> result;
      
          template<class Container>
          void filtered_add(const Container &xs)
          {
              for(auto it = xs.begin(); it != xs.end(); ++it)
              {
                  const auto &x = *it;
                  if ((x & target) == target) result.push_back(x);
              }
          }
      
          template<class Container>
          void add(const Container &xs)
          {
              result.insert(result.end(), xs.begin(), xs.end());
          }
      
          void flatout(const node_type &node)
          {
              if (node.node_a.get()) flatout(*node.node_a);
              if (node.node_b.get()) flatout(*node.node_b);
              add(node.category_ab);
              add(node.category_zero);
          }
      
          void walk(const node_type &node)
          {
              const auto &mask_a = node.mask_a;
              const auto &mask_b = node.mask_b;
      
              if (mask_a == mask_b)
              {
                  filtered_add(node.category_ab);
                  return;
              }
      
              switch(identify_cover(mask_a, mask_b, target))
              {
              case cover_category::a:
                  if (node.node_a.get()) walk(*node.node_a);
                  filtered_add(node.category_ab);
                  break;
      
              case cover_category::b:
                  if (node.node_b.get()) walk(*node.node_b);
                  filtered_add(node.category_ab);
                  break;
      
              case cover_category::ab:
                  filtered_add(node.category_ab);
                  break;
      
              case cover_category::zero:
                  flatout(node);
                  break;
              }
          }
      
      };
      
      
      int main()
      {
          std::mt19937 rng;
          std::uniform_int_distribution<uint32_t> uint_dist;
      
          const auto bitmap = uint_dist(rng);
          //const uint32_t bitmap = 0;
      
          std::vector<uint32_t> bitmaps;
          bitmaps.resize(10000000);
      
          //for(auto it = bitmaps.begin(); it < bitmaps.end(); ++it) *it = uint_dist(rng);
          for(auto it = bitmaps.begin(); it < bitmaps.end(); ++it) *it = uint_dist(rng) & uint_dist(rng) & uint_dist(rng); // sparse
      
          const auto brute = [&bitmaps, bitmap](){
              brute_lookup(bitmap, bitmaps.begin(), bitmaps.end());
          };
      
          std::auto_ptr<cover_node<uint32_t>> cover_tree = build_cover_node<std::vector<uint32_t>::const_iterator>(0, 32, bitmaps.begin(), bitmaps.end());
          pprint(std::cout, cover_tree);
      
          const auto traversal = [&cover_tree, bitmap]() {
              cover_walker<uint32_t>(bitmap, *cover_tree).get_result();
          };
      
          bench("brute-force array search", brute);
          bench("tree-traversal search", traversal);
      
      
          return 0;
      }
      

答案 2 :(得分:1)

此解决方案将占用与M中“1”位数成比例的存储器, 但应该合理地运行。我在假设 集合M是静态的,具有许多目标匹配请求。

<强>预处理:

给定集合M,将其按升序排序。接下来构造一个包含一个数组 每位插槽。您使用的是32位数字,因此需要32个插槽的数组。调用此数组:MBit [0..31]。 每个插槽包含 指向链表的指针(称之为:MPtr)。链表包含M中的数字 相应的位置位。对于 例如,M中第3位的所有数字都可以在链表中找到:MBit [3] .MPtr。

基本算法是处理每个MBit列表 其中相应的目标号码设置为“1”位。只有所有已处理列表共有的数字 被选中。由于每个MPtr列表包含已排序的数字,我们可以向前扫描,直到我们要查找的数字 找到(匹配),找到一个更大的数字(不匹配)或列表已用尽(不再匹配)。

这种方法的主要缺点是来自M的相同数字将出现在尽可能多的数字中 链表,因为它有'1'位。 这有点 记忆命中,但你必须在某处给点东西!

<强>概要

如上所述构建MBit数组。

为目标号码构建另一个数组数据结构。该数组包含1 目标中的每位插槽(称为:TBit [0..31])。每个插槽 包含一个链表指针(称之为:MPtr)和一个布尔值(称之为:BitSet)。 BitSet表示是否对应 设置了一点目标。

鉴于新目标:

/* Initialize each slot of TBit to the head of the corresponding MBit Linked list */
if Target == 0 then goto Done      /* Target contains only zero bits - no matches */
for (i = 0; i < 32; i++) {         /* Bit 0 is LSB, Bit 31 is MSB */
   TBit[i].MPtr = MBit[i].MPtr     /* List of numbers with bit i set */
   TBit[i].BitSet = (Target && 1)  /* Target bit i set? */
   Target = Target >> 1            /* Shift 1 bit right */
}

/* Iterate until one of the linked lists in TBit is exhausted */
for(;;) {
   First1Bit = False          /* Found first '1' bit in Target for this iteration */
   AcceptCandidate = True     /* Assume Candidate number matches all '1' bits in Target */
   for (i = 0; i < 32 & AcceptCandidate; i++) { /* For each bit in TBit Array... */
      if !TBit[i].BitSet then iterate          /* Target bit is zero, nothing to add */
      if !First1Bit then {                     /* First Target '1' bit, initialize for iteration */
         if TBit[i].MPtr == Nil then goto Done /* List exhausted, no more matches possible */
         Candidate = value(TBit[i].MPtr)       /* Candidate Number from linked list */
         TBit[i].MPtr = next(TBit[i].MPtr)     /* setup for next cycle */
         First1Bit = True                      /* First 1 bit for this cycle completed */
      } else {
         /* Scan list until Candidate or larger number found... */
         while (TBit[i].MPtr != Nil & value(TBit[i].MPtr) < Candidate) {
            TBit[i].MPtr = next(TBit[i].MPtr)
         }
         if TBit[i].MPtr = Nil then goto Done  /* List exhausted, no more matches possible */
         AcceptCandidate = (value(TBit[i].MPtr) == Candidate)
      }
   }
   if AcceptCandidate then {
      /* Candidate contains a '1' bit in the same positions Target contains a '1' bit */
      /* Do what you need to do with Candidate */
   }
}
Done: /* No further matches on Target are possible */ 

我可以看到对上述大纲的一些优化,但认为这将是一个良好的开端。

答案 3 :(得分:0)

这似乎是SQL数据库擅长的东西。如果你在(MSB,BitsSet,Value)上放置一个复合索引,你的结果应该非常快。

IntegerList:
    Value INT
    BitsSet INT
    MSB INT

INSERT INTO IntegerList(Value, BitsSet, MSB) VALUES(@Value, GetBitsSet(@Value), GetMSB(@Value)

SELECT Value FROM IntegerList WHERE MSB = GetMSB(@Target) AND BitsSet >= GetBitsSet(@Target) AND (Value & @Target) = @Target

---GetMSB
DECLARE @b BIGINT
DECLARE @c INT
SELECT @b = 0x80000000
SELECT @c = 32
WHILE (@b <> 0)
BEGIN
    IF (@b & @value) = @b
    BEGIN
        RETURN @c
    END
    SELECT @b = @b / 2
    SELECT @c = @c - 1
END

---GetBitsSet
DECLARE @b BIGINT
DECLARE @c INT
SELECT @b = 0x80000000
SELECT @c = 0
WHILE (@b <> 0)
BEGIN
    IF (@b & @value) = @b
    BEGIN
        SELECT @c = @c + 1
    END
    SELECT @b = @b / 2
END
RETURN @c

如果您必须使用直接C ++,我建议尝试模拟SQL方法。

  

使用int Value,BitsSet,MSB

创建结构或类      

创建2个节点数组,一个为MSB排序,另一个为BitsSet。

     

在MSB(匹配目标的MSB)阵列和BitsSet(匹配所有BitsSet&gt; =目标)阵列上使用二进制搜索。

     

获得这两个结果的联合,然后执行你的Target&amp;值==目标检查。

答案 4 :(得分:0)

一般方法。

按位构建树。一级节点是fisrt位,而二级节点是第二位,......

当你获得面具时,你只需要否定它,你知道你必须排除哪些树的部分 只能快速遍历相关的低谷节点。

N_bits空间解决方案*
只需将此整数排序到位并使用二进制搜索来遍历此树。

复杂度O(N_results * N_bits))

与bruteforce O(N)相比,它看起来比第3因子运行得更快。但这是我在c ++中的第一个代码,所以我可能会错过一些东西。任何关于代码的评论都会很酷。

代码如何运作?
它只使用的数据结构是输入的排序数组 在每个步骤中,它使用二进制搜索白色std::lower_bound();基于绑定将数组拆分为两个部分 如果mask [depth]为1,则不需要在该树的左侧部分进行操作 无论如何,它必须正确。

如果您将掩码设置为0xFFFFFFFF,它将始终正确并且将在log(n)时间内执行 如果你输入掩码0x00000000它将返回所有解决方案,因此它将在左右两个步骤中进行,并且将比天真循环执行更糟糕。一旦数组大小小于10(可以更改),它就会使用朴素的方法返回输出向量中的所有解。

在长度为100k且掩码 0x11111111(8位)的随机输入向量上,它比天真循环运行速度快两倍。

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

void find_masks(const int mask,const int bound,const int depth,const vector<int>::iterator begin,const vector<int>::iterator end, vector<int> &output )
{
    vector<int>::iterator i,split;
    if( ( distance(begin,end)<10 ) | (depth==0) ) //if less than 10 we just bruteforce it is also stopping condition
    {
        for(i=begin; i!=end; i++)
        {
            if(mask == (int)(mask & (*i)))
            {
                output.push_back(*i);
            }
        }
        return;
    }

    int bitmask =  (1<<depth) ;
    split=lower_bound(begin,end,bound | bitmask );

    if( !(mask & bitmask) ) //go left if mask == 0 at this point
    {
        find_masks(mask,bound,depth-1,begin,split, output );
    }
    find_masks(mask,bound | bitmask ,depth-1,split, end, output );
}


int main ()
{
    vector<int> result,v,bruteforce;
    vector<int>::iterator i;

    //100k random vector
    for(int i=0; i<100000; i++)
    {
        int r=0;
        for(int j=0; j<4; j++)
        {
            r=r<<8;
            r=r^rand();
        }
        v.push_back(r);
    }

    sort(v.begin(),v.end());

    int mask=0xF0F;
    //use sorted vector and binary search for traversing tree
    find_masks(mask,0,31,v.begin(),v.end(), result );

    //use naive loop
    bruteforce.erase(bruteforce.begin(),bruteforce.end());
    for(i=v.begin(); i!=v.end(); i++)
    {
        if(mask == (int)(mask & (*i)))
        {
            bruteforce.push_back(*i);
        }
    }

    cout<<"n solutions binary search " << distance(result.begin(),result.end())<<endl;
    cout<<"n solutions loop "  << distance(bruteforce.begin(),bruteforce.end())<<endl;
    cout<<"are solutions same => " << equal(result.begin(),result.end(),bruteforce.begin());

    return 0;
}