假设我有一个M 32位整数的大数组,其中每个值设置不超过N位。现在我想返回与查询Target AND Value == Target匹配的子集,即目标位出现在数组值中的值。
蛮力很容易,只需迭代数组并提取目标& value == target。如果M变得非常大,这变得太慢了。任何人都知道如何将数组转换为更适合搜索的数据结构?
一种方法是为每个位存储数组或值(因此对于32位数组,您需要其中的32个),然后仅搜索与目标值中的每个位匹配的值。除非N接近32或目标接近N位设置,否则这有点帮助。由于我所寻找的基本上是部分匹配,散列或排序似乎没有帮助。
确切的正确结果是必需的。这必须在不访问并行硬件(如GPU或使用SIMD)的情况下工作。
我将使用C ++,但只是一些指向算法或想法的指针很好。最可能的情况是M = 100000和N = 8并且经常被调用。
重申:我需要部分匹配(例如item = 011000匹配目标= 001000)不是完全匹配。虽然M项是提前知道的,但目标的可能值可以是任何值。
我终于决定坚持使用蛮力。对于80,000件物品,不值得做其他任何事情。我想如果数据集的大小更像是800,000,000,那么它可能是值得的。
答案 0 :(得分:2)
答案 1 :(得分:2)
如何从另一个视点查看此问题?将整数集视为一维图片的集合。组织它们的方法之一是将每张图片分成两部分A
和B
,并按类别对所有图片进行排序:
A
仅包含零,B
包含一些位(至少一个)A
包含一些位集,B
仅包含零A
和B
包含一些位集(超集1和2)A
和B
仅包含零现在,您将目标/蒙版的相同分割放入相同的部分,并以相同的方式进行分类。之后你可以推断下一个(按目标/面具类别):
在下一个级别,部分A
和B
会再次被拆分(因此您有4个部分),依此类推。
当然我不希望它能加快速度。但对于某些数据集,其中没有设置那么多位(与基于位的树的变体相反),它可能会更好。
更新:我在Haskell变体中获得了34%的加速:
benchmarking burte-force list search
mean: 14.67350 ms, lb 14.65103 ms, ub 14.71614 ms, ci 0.950
std dev: 153.6920 us, lb 95.70642 us, ub 246.6497 us, ci 0.950
benchmarking tree-lookup search
mean: 9.592271 ms, lb 9.564509 ms, ub 9.667668 ms, ci 0.950
std dev: 216.6084 us, lb 100.3315 us, ub 455.2730 us, ci 0.950
源代码:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
import Control.Arrow (first)
import Control.DeepSeq
import Data.Word
import Data.Bits
import Data.List
import Criterion.Main
import Criterion.Config
import System.Random
class BitmapsCollection a where
type BitmapOf a
bitmapsCollection :: [BitmapOf a] -> a
findMaskedPattern :: a -> BitmapOf a -> [BitmapOf a]
-- Plain lookup through array
newtype BitmapsList p = BitmapsList [p]
deriving (Show, NFData)
instance Bits p => BitmapsCollection (BitmapsList p) where
type BitmapOf (BitmapsList p) = p
bitmapsCollection = BitmapsList
findMaskedPattern (BitmapsList xs) m = filter (\e -> e .&. m == m) xs
-- Tree of bitmap partitions
data Bits p => BitmapsCoverTree p = EmptyBitmapsCoverNode
| BitmapsCoverNode (p,p) (BitmapsCoverTree p) (BitmapsCoverTree p) [p] [p]
| LeafBitmapsCoverNode [p]
deriving Show
instance (Bits p, NFData p) => NFData (BitmapsCoverTree p) where
rnf EmptyBitmapsCoverNode = ()
rnf (LeafBitmapsCoverNode xs) = rnf xs
rnf (BitmapsCoverNode mask node1 node2 category3 category4) = mask `deepseq` node1 `deepseq` node2 `deepseq` category3 `deepseq` rnf category4
data BitmapCoverCategory = CoverA | CoverB | CoverAB | CoverZero
coverCategory :: Bits a => (a, a) -> a -> BitmapCoverCategory
coverCategory (maskA, maskB) x = case (x .&. maskA, x .&. maskB) of
(0, 0) -> CoverZero
(0, _) -> CoverB
(_, 0) -> CoverA
_ -> CoverAB
coverCategorize :: Bits a => (a, a) -> [a] -> ([a], [a], [a], [a])
coverCategorize mask = walk (id, id, id, id) where
category = coverCategory mask
walk (a, b, ab, z) [] = (a [], b [], ab [], z [])
walk (a, b, ab, z) (x:xs) = case (category x) of
CoverA -> walk (a . (x:), b, ab, z) xs
CoverB -> walk (a, b . (x:), ab, z) xs
CoverAB -> walk (a, b, ab . (x:), z) xs
CoverZero -> walk (a, b, ab, z . (x:)) xs
suffixMask, prefixMask :: Bits a => Int -> a
suffixMask n = complement 0 `shiftL` n
prefixMask n = complement (suffixMask n)
rangeMask :: Bits a => (Int, Int) -> a
rangeMask (n, m) = suffixMask n .&. prefixMask m
instance Bits p => BitmapsCollection (BitmapsCoverTree p) where
type BitmapOf (BitmapsCoverTree p) = p
bitmapsCollection bms = buildCoverNode (0, bitSize (head bms)) bms where
splitBoundary = 4
buildCoverNode :: Bits a => (Int, Int) -> [a] -> BitmapsCoverTree a
buildCoverNode _ [] = EmptyBitmapsCoverNode
buildCoverNode (n, m) xs | (m - n) < splitBoundary = LeafBitmapsCoverNode xs -- too small
buildCoverNode (n, m) xs = BitmapsCoverNode mask node1 node2 category3 category4 where
mm = (n+m) `div` 2
mask = (rangeMask (n, mm), rangeMask (mm, m))
(category1, category2, category3, category4) = coverCategorize mask xs
node1 = buildCoverNode (n, mm) category1
node2 = buildCoverNode (mm, m) category2
findMaskedPattern EmptyBitmapsCoverNode _ = []
findMaskedPattern (LeafBitmapsCoverNode ps) m = filter (\e -> e .&. m == m) ps
findMaskedPattern (BitmapsCoverNode _ node1 node2 category3 category4) 0 = flatten where
flatten = findMaskedPattern node1 0 ++ findMaskedPattern node2 0 ++ category3 ++ category4
findMaskedPattern (BitmapsCoverNode mask node1 node2 category3 category4) m = result where
targetCategory = coverCategory mask m
filterTarget = filter (\p -> p .&. m == m)
result = case targetCategory of
CoverA -> findMaskedPattern node1 m ++ filterTarget category3
CoverB -> findMaskedPattern node2 m ++ filterTarget category3
CoverAB -> filterTarget category3
CoverZero -> category1 ++ category2 ++ category3 ++ category4
category1 = findMaskedPattern node1 0
category2 = findMaskedPattern node2 0
main = do
gen <- getStdGen
let size = 1000000
bitmaps :: [Word32]
(bitmap, genm) = first fromIntegral (random gen :: (Int, StdGen))
bitmaps = map fromIntegral (take size (randoms genm) :: [Int])
bitmapsList = bitmapsCollection bitmaps :: BitmapsList Word32
bitmapsTree = bitmapsCollection bitmaps :: BitmapsCoverTree Word32
bitmapsList `deepseq` bitmapsTree `deepseq` return ()
defaultMainWith defaultConfig (return ()) [
bench "burte-force list search" $ nf (findMaskedPattern bitmapsList) bitmap,
bench "tree-lookup search" $ nf (findMaskedPattern bitmapsTree) bitmap
]
更新:C ++ 11代码的种类。它为蛮力提供10.9444 ms,对此算法提供8.69286 ms。我通过使开启比特的分布输出更加稀疏来欺骗。
#include <iostream>
#include <vector>
#include <list>
#include <random>
#include <functional>
#include <cassert>
#include <memory>
#include <sys/time.h>
#include <sys/resource.h>
// benchmark boiler plate code
double cputime()
{
struct rusage usage;
int check = getrusage( RUSAGE_SELF, &usage );
assert(check == 0);
return (usage.ru_utime.tv_sec + usage.ru_utime.tv_usec*1.0e-6);
//return (((double)clock())/((double)CLOCKS_PER_SEC));
}
double measure(std::function<void()> func, size_t iterations)
{
double t1, t2;
size_t i;
t1 = cputime();
for(i = 0; i < iterations; ++i) func();
t2 = cputime();
return (t2 - t1);
}
std::pair<std::string, double> human(double value)
{
static const std::vector<std::pair<std::string, double>> prefixes = {
{ "pico", 1e-12 },
{ "nano", 1e-9 },
{ "micro", 1e-6 },
{ "milli", 1e-3 },
{ "", 1 },
{ "kilo", 1e3 },
{ "mega", 1e6 },
{ "giga", 1e9 },
{ "tera", 1e12 }
};
for(auto it = prefixes.begin(); it != prefixes.end(); ++it)
{
if (it->second > value)
{
auto prev = *(--it);
return std::pair<std::string, double>(prev.first, value/prev.second);
}
}
auto last = *prefixes.rbegin();
return std::pair<std::string, double>(last.first, value/last.second);
}
void bench(std::string name, std::function<void()> func, double bench_seconds = 10)
{
const double accurate_seconds = 0.1;
std::cout << "benchmarking " << name << std::endl
<< "estimating iterations" << std::endl;
size_t base_iterations = 1;
double base_seconds = measure(func, base_iterations);
while(base_seconds < accurate_seconds)
{
base_iterations *= 2;
base_seconds = measure(func, base_iterations);
}
const size_t iterations = bench_seconds * base_iterations / base_seconds;
const double estimated_seconds = iterations * base_seconds / base_iterations;
std::cout << "estimated time " << estimated_seconds << " seconds (" << iterations << " iterations)" << std::endl;
const double seconds = measure(func, iterations);
const auto ips = human(iterations / seconds);
const auto spi = human(seconds / iterations);
std::cout << "benchmark took " << seconds << " seconds" << std::endl
<< "average speed " << ips.second << ' ' << ips.first << " iterations per second" << std::endl
<< "average time " << spi.second << ' ' << spi.first << " seconds per iteration" << std::endl;
}
// plain brute-force lookup
template<class iterator>
std::list<typename iterator::value_type> brute_lookup(const typename iterator::value_type pattern, iterator begin, const iterator &end)
{
typedef typename iterator::value_type value_type;
std::list<value_type> result;
for(;begin != end; ++begin)
{
if ((*begin & pattern) == pattern) result.push_back(*begin);
}
return result;
}
// tree-traversing lookup
template<class _value_type>
struct cover_node
{
typedef _value_type value_type;
value_type mask_a, mask_b;
std::auto_ptr<cover_node<value_type>> node_a, node_b;
std::vector<value_type> category_ab, category_zero;
};
template<class _value_type>
std::ostream &pprint(std::ostream &s, const std::auto_ptr<cover_node<_value_type>> &node, const std::string indent = "")
{
if (!node.get())
{
s << indent << "cover_node: (null)" << std::endl;
return s;
}
s << indent
<< "cover_node: mask = " << std::hex << node->mask_a << "/" << node->mask_b
<< ", leafs = " << std::dec << node->category_ab.size() << "/" << node->category_zero.size() << std::endl;
const std::string sub = indent + " ";
pprint(s, node->node_a, sub);
return pprint(s, node->node_b, sub);
}
enum class cover_category { a, b, ab, zero };
template<class vt>
cover_category
identify_cover(const vt mask_a, const vt mask_b, const vt x)
{
const auto a = (x & mask_a) != 0;
const auto b = (x & mask_b) != 0;
if (!a)
{
if (!b) return cover_category::zero;
else return cover_category::b;
}
else
{
if (!b) return cover_category::a;
else return cover_category::ab;
}
}
template<class vt>
vt bitmask(const size_t n, const size_t m)
{
return (~0 << n) & ~(~0 << m);
}
template<class iterator>
std::auto_ptr<cover_node<typename iterator::value_type>>
build_cover_node(size_t n, size_t m, iterator begin, const iterator &end)
{
const size_t split_boundary = 4;
typedef typename iterator::value_type value_type;
std::auto_ptr<cover_node<value_type>> node(new cover_node<value_type>);
if ((m - n) < split_boundary) // too small group
{
// overlapped mask for simplification of sub-tree into list
node->mask_a = ~0;
node->mask_b = ~0;
node->category_ab.insert(node->category_ab.end(), begin, end);
return node;
}
std::list<value_type> category_a, category_b;
const size_t h = (n + m) / 2;
node->mask_a = bitmask<value_type>(n, h);
node->mask_b = bitmask<value_type>(h, m);
auto &category_ab = node->category_ab;
auto &category_zero = node->category_zero;
// categorize
for(;begin != end; ++begin)
{
switch(identify_cover(node->mask_a, node->mask_b, *begin))
{
case cover_category::a:
category_a.push_back(*begin);
break;
case cover_category::b:
category_b.push_back(*begin);
break;
case cover_category::ab:
category_ab.push_back(*begin);
break;
case cover_category::zero:
category_zero.push_back(*begin);
break;
}
}
// build sub-nodes
if (!category_a.empty()) node->node_a = build_cover_node(n, h, category_a.begin(), category_a.end());
if (!category_b.empty()) node->node_b = build_cover_node(h, m, category_b.begin(), category_b.end());
return node;
}
template<class _value_type>
struct cover_walker
{
typedef _value_type value_type;
typedef cover_node<value_type> node_type;
cover_walker(value_type target_pattern, const node_type &root_node) :
target(target_pattern)
{
walk(root_node);
}
const std::list<value_type> &get_result() const
{
return result;
}
private:
value_type target;
std::list<value_type> result;
template<class Container>
void filtered_add(const Container &xs)
{
for(auto it = xs.begin(); it != xs.end(); ++it)
{
const auto &x = *it;
if ((x & target) == target) result.push_back(x);
}
}
template<class Container>
void add(const Container &xs)
{
result.insert(result.end(), xs.begin(), xs.end());
}
void flatout(const node_type &node)
{
if (node.node_a.get()) flatout(*node.node_a);
if (node.node_b.get()) flatout(*node.node_b);
add(node.category_ab);
add(node.category_zero);
}
void walk(const node_type &node)
{
const auto &mask_a = node.mask_a;
const auto &mask_b = node.mask_b;
if (mask_a == mask_b)
{
filtered_add(node.category_ab);
return;
}
switch(identify_cover(mask_a, mask_b, target))
{
case cover_category::a:
if (node.node_a.get()) walk(*node.node_a);
filtered_add(node.category_ab);
break;
case cover_category::b:
if (node.node_b.get()) walk(*node.node_b);
filtered_add(node.category_ab);
break;
case cover_category::ab:
filtered_add(node.category_ab);
break;
case cover_category::zero:
flatout(node);
break;
}
}
};
int main()
{
std::mt19937 rng;
std::uniform_int_distribution<uint32_t> uint_dist;
const auto bitmap = uint_dist(rng);
//const uint32_t bitmap = 0;
std::vector<uint32_t> bitmaps;
bitmaps.resize(10000000);
//for(auto it = bitmaps.begin(); it < bitmaps.end(); ++it) *it = uint_dist(rng);
for(auto it = bitmaps.begin(); it < bitmaps.end(); ++it) *it = uint_dist(rng) & uint_dist(rng) & uint_dist(rng); // sparse
const auto brute = [&bitmaps, bitmap](){
brute_lookup(bitmap, bitmaps.begin(), bitmaps.end());
};
std::auto_ptr<cover_node<uint32_t>> cover_tree = build_cover_node<std::vector<uint32_t>::const_iterator>(0, 32, bitmaps.begin(), bitmaps.end());
pprint(std::cout, cover_tree);
const auto traversal = [&cover_tree, bitmap]() {
cover_walker<uint32_t>(bitmap, *cover_tree).get_result();
};
bench("brute-force array search", brute);
bench("tree-traversal search", traversal);
return 0;
}
答案 2 :(得分:1)
此解决方案将占用与M中“1”位数成比例的存储器, 但应该合理地运行。我在假设 集合M是静态的,具有许多目标匹配请求。
<强>预处理:强>
给定集合M,将其按升序排序。接下来构造一个包含一个数组 每位插槽。您使用的是32位数字,因此需要32个插槽的数组。调用此数组:MBit [0..31]。 每个插槽包含 指向链表的指针(称之为:MPtr)。链表包含M中的数字 相应的位置位。对于 例如,M中第3位的所有数字都可以在链表中找到:MBit [3] .MPtr。
基本算法是处理每个MBit列表 其中相应的目标号码设置为“1”位。只有所有已处理列表共有的数字 被选中。由于每个MPtr列表包含已排序的数字,我们可以向前扫描,直到我们要查找的数字 找到(匹配),找到一个更大的数字(不匹配)或列表已用尽(不再匹配)。
这种方法的主要缺点是来自M的相同数字将出现在尽可能多的数字中 链表,因为它有'1'位。 这有点 记忆命中,但你必须在某处给点东西!
<强>概要强>
如上所述构建MBit数组。
为目标号码构建另一个数组数据结构。该数组包含1 目标中的每位插槽(称为:TBit [0..31])。每个插槽 包含一个链表指针(称之为:MPtr)和一个布尔值(称之为:BitSet)。 BitSet表示是否对应 设置了一点目标。
鉴于新目标:
/* Initialize each slot of TBit to the head of the corresponding MBit Linked list */
if Target == 0 then goto Done /* Target contains only zero bits - no matches */
for (i = 0; i < 32; i++) { /* Bit 0 is LSB, Bit 31 is MSB */
TBit[i].MPtr = MBit[i].MPtr /* List of numbers with bit i set */
TBit[i].BitSet = (Target && 1) /* Target bit i set? */
Target = Target >> 1 /* Shift 1 bit right */
}
/* Iterate until one of the linked lists in TBit is exhausted */
for(;;) {
First1Bit = False /* Found first '1' bit in Target for this iteration */
AcceptCandidate = True /* Assume Candidate number matches all '1' bits in Target */
for (i = 0; i < 32 & AcceptCandidate; i++) { /* For each bit in TBit Array... */
if !TBit[i].BitSet then iterate /* Target bit is zero, nothing to add */
if !First1Bit then { /* First Target '1' bit, initialize for iteration */
if TBit[i].MPtr == Nil then goto Done /* List exhausted, no more matches possible */
Candidate = value(TBit[i].MPtr) /* Candidate Number from linked list */
TBit[i].MPtr = next(TBit[i].MPtr) /* setup for next cycle */
First1Bit = True /* First 1 bit for this cycle completed */
} else {
/* Scan list until Candidate or larger number found... */
while (TBit[i].MPtr != Nil & value(TBit[i].MPtr) < Candidate) {
TBit[i].MPtr = next(TBit[i].MPtr)
}
if TBit[i].MPtr = Nil then goto Done /* List exhausted, no more matches possible */
AcceptCandidate = (value(TBit[i].MPtr) == Candidate)
}
}
if AcceptCandidate then {
/* Candidate contains a '1' bit in the same positions Target contains a '1' bit */
/* Do what you need to do with Candidate */
}
}
Done: /* No further matches on Target are possible */
我可以看到对上述大纲的一些优化,但认为这将是一个良好的开端。
答案 3 :(得分:0)
这似乎是SQL数据库擅长的东西。如果你在(MSB,BitsSet,Value)上放置一个复合索引,你的结果应该非常快。
IntegerList:
Value INT
BitsSet INT
MSB INT
INSERT INTO IntegerList(Value, BitsSet, MSB) VALUES(@Value, GetBitsSet(@Value), GetMSB(@Value)
SELECT Value FROM IntegerList WHERE MSB = GetMSB(@Target) AND BitsSet >= GetBitsSet(@Target) AND (Value & @Target) = @Target
---GetMSB
DECLARE @b BIGINT
DECLARE @c INT
SELECT @b = 0x80000000
SELECT @c = 32
WHILE (@b <> 0)
BEGIN
IF (@b & @value) = @b
BEGIN
RETURN @c
END
SELECT @b = @b / 2
SELECT @c = @c - 1
END
---GetBitsSet
DECLARE @b BIGINT
DECLARE @c INT
SELECT @b = 0x80000000
SELECT @c = 0
WHILE (@b <> 0)
BEGIN
IF (@b & @value) = @b
BEGIN
SELECT @c = @c + 1
END
SELECT @b = @b / 2
END
RETURN @c
如果您必须使用直接C ++,我建议尝试模拟SQL方法。
使用int Value,BitsSet,MSB
创建结构或类创建2个节点数组,一个为MSB排序,另一个为BitsSet。
在MSB(匹配目标的MSB)阵列和BitsSet(匹配所有BitsSet&gt; =目标)阵列上使用二进制搜索。
获得这两个结果的联合,然后执行你的Target&amp;值==目标检查。
答案 4 :(得分:0)
一般方法。
按位构建树。一级节点是fisrt位,而二级节点是第二位,......
当你获得面具时,你只需要否定它,你知道你必须排除哪些树的部分 只能快速遍历相关的低谷节点。
N_bits空间解决方案*
只需将此整数排序到位并使用二进制搜索来遍历此树。
复杂度O(N_results * N_bits))
与bruteforce O(N)相比,它看起来比第3因子运行得更快。但这是我在c ++中的第一个代码,所以我可能会错过一些东西。任何关于代码的评论都会很酷。
代码如何运作?
它只使用的数据结构是输入的排序数组
在每个步骤中,它使用二进制搜索白色std::lower_bound();
基于绑定将数组拆分为两个部分
如果mask [depth]为1,则不需要在该树的左侧部分进行操作
无论如何,它必须正确。
如果您将掩码设置为0xFFFFFFFF,它将始终正确并且将在log(n)时间内执行 如果你输入掩码0x00000000它将返回所有解决方案,因此它将在左右两个步骤中进行,并且将比天真循环执行更糟糕。一旦数组大小小于10(可以更改),它就会使用朴素的方法返回输出向量中的所有解。
在长度为100k且掩码 0x11111111(8位)的随机输入向量上,它比天真循环运行速度快两倍。
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
void find_masks(const int mask,const int bound,const int depth,const vector<int>::iterator begin,const vector<int>::iterator end, vector<int> &output )
{
vector<int>::iterator i,split;
if( ( distance(begin,end)<10 ) | (depth==0) ) //if less than 10 we just bruteforce it is also stopping condition
{
for(i=begin; i!=end; i++)
{
if(mask == (int)(mask & (*i)))
{
output.push_back(*i);
}
}
return;
}
int bitmask = (1<<depth) ;
split=lower_bound(begin,end,bound | bitmask );
if( !(mask & bitmask) ) //go left if mask == 0 at this point
{
find_masks(mask,bound,depth-1,begin,split, output );
}
find_masks(mask,bound | bitmask ,depth-1,split, end, output );
}
int main ()
{
vector<int> result,v,bruteforce;
vector<int>::iterator i;
//100k random vector
for(int i=0; i<100000; i++)
{
int r=0;
for(int j=0; j<4; j++)
{
r=r<<8;
r=r^rand();
}
v.push_back(r);
}
sort(v.begin(),v.end());
int mask=0xF0F;
//use sorted vector and binary search for traversing tree
find_masks(mask,0,31,v.begin(),v.end(), result );
//use naive loop
bruteforce.erase(bruteforce.begin(),bruteforce.end());
for(i=v.begin(); i!=v.end(); i++)
{
if(mask == (int)(mask & (*i)))
{
bruteforce.push_back(*i);
}
}
cout<<"n solutions binary search " << distance(result.begin(),result.end())<<endl;
cout<<"n solutions loop " << distance(bruteforce.begin(),bruteforce.end())<<endl;
cout<<"are solutions same => " << equal(result.begin(),result.end(),bruteforce.begin());
return 0;
}