
时间:2015-07-28 18:46:56

标签: c algorithm boolean


每个输入包含一个小集合(&lt; 100)的各种大小的布尔向量(<20000个元素),每个都有几个1和多个0:

A = [ 0 0 0 1 0 0 0 0 0 0 0 ... ]
B = [ 0 0 0 0 1 0 ... ]

我还有很多(> 20000)布尔AND表达式。这些表达式对于所有查询都是常量。

S[1] = A[10] AND B[52] AND F[15] AND U[2]
S[2] = I[8] AND Z[4]



4 个答案:

答案 0 :(得分:5)



(A[10] && B[52] && F[15] && U[2])


答案 1 :(得分:4)

You seem to be using lots of data. It's a guess, but I'd say you'll get optimal behavior by preprocessing your expressions into cache optimal passes. Consider the two expressions given:

S[1] = A[10] AND B[52] AND F[15] AND U[2]
S[2] = I[8] AND Z[4]

rewrite these as:

S[1] = 1;
S[1] &= A[10];
S[1] &= B[52];
S[1] &= F[15];
S[1] &= U[2];

S[2] = 1;
S[2] &= I[8];
S[2] &= Z[4];

Then sort all of the expressions together to create one long list of operations:

S[1] = 1;
S[2] = 1;

S[1] &= A[10];
S[1] &= B[52];
S[1] &= F[15];
S[2] &= I[8];
S[1] &= U[2];
S[2] &= Z[4];

Consider the size of the machine cache on hand. We want all of the input vectors in cache. That probably can't happen so we know we will be pulling the input vectors and the result vectors in and out of memory multiple times. We want to partition the available machine cache into three parts: input vector chunk, result vector chunk, and some working space (where our current list of operations will be pulled from).

Now, walk the list of expressions pulling out expressions that fall into the A-I and S[1]-S[400] range. Then walk again pulling J-T (or whatever fits in cache) and pull those operations next, once you get to the end of the operations list repeat for s[401]-s[800]. This is the final order of execution for the operations. Note that this can be parallelized without contention across the S bands.

The down side is that you do not get the early out behavior. The upside is you only have cache failures as you transition blocks of computation. For such a large data set I suspect this (and the elimination of all branching) will overwhelm the early out advantage.

If you still want to try to use the early out optimization you can it is just harder to implement. Consider: once you have your cache bracket A-I & S[1]-s[400], and you have created a list of operations across that bracket:

S[1] &= A[10];
S[1] &= B[52];
S[1] &= F[15];
S[2] &= I[8];

You can then reorder the operations to group them by S[x] (which this example already was). Now if you find A[10] is false you can "early out" to the S[2] block. As far as how to implement this? Well, your operations now need to know how many to skip forward from the current operation:

Operation[x  ] => (S[1] &= A[10], on false, x+=3)
Operation[x+1] => (S[1] &= B[52], on false, x+=2)
Operation[x+2] => (S[1] &= F[15], on false, x+=1)
Operation[x+3] => (S[2] &= I[8]...

Again, I suspect simply adding the branching in will be slower than just performing all of the other work. This is not a full early out process since the when you move to the next input block chunk you'll have to reinspect each S[x] value accessed to determine if it has already failed and should be skipped.

答案 2 :(得分:2)

  1. 将输入转换为打包形式(非零元素的索引列表)。为了使整个方法比按顺序评估每个表达式更快,你需要使用bit twiddling的编译器内在函数一次处理几个元素(假设每个输入布尔值只占用一个字节,或者甚至更好一位)。
  2. 预处理&#39; AND&#39;表达式到数组将索引从打包输入数组映射到它所属的表达式。 (但如果某个变量出现在多个表达式中,则需要进行一些特殊处理。)
  3. 将表达式的计数器初始化为0。
  4. 读取打包的输入数组并增加相应表达式的计数器。
  5. 具有与其术语数量相等的计数器的表达式为“真实”,其他表达式为“假”&#39;

答案 3 :(得分:1)


  1. 包含带有该变量的表达式的每个变量的列表(即A10的列表为[S1,A10的任何其他表达式])
  2. 表达式中变量数量的每个表达式的计数(即S1的计数为4)
  3. 然后对每个输入:

    1. 将每个表达式的计数初始化为该表达式中的变量总数
    2. 循环输入中的所有稀疏设置位,并为每个输入递减包含该位的所有表达式的计数
    3. 返回计数达到0的所有表达式。