鉴于存在3元组,其中:
执行通配符(*)搜索的最有效算法/数据结构是什么,例如:
(1, *, 6) (3601, *, *) (*, 1935, *)
目标是在应用程序级别上拥有像元组空间一样的Linda
答案 0 :(得分:2)
嗯,只有8种可能的通配符排列方式,因此您可以轻松构建6个多图和一组作为索引:一个用于查询中每个通配符的排列。您不需要第8个索引,因为查询(*,*,*)
通常会返回所有元组。该集用于没有通配符的元组;在这种情况下,只需要进行成员资格测试。
multimap将密钥带到集合中。在您的示例中,例如,查询(1,*,6)
会查询多重映射以查找(X,*,Y)
形式的查询,该查询将<X,Y>
键与X
中所有元组的集合相关联Y
第一个位置,第三个位置X
。在这种情况下,Y
= 1且<a,b,c>
= 6。
使用任何合理的基于散列的多图实现,查找应该非常快。几百秒应该是容易的,并且每秒几千个可行(例如当代x86 CPU)。
插入和删除需要更新地图和设置。同样,这应该相当快,但当然不如查找快。每秒几百次应该是可行的。
只有大约10 ^ 5个元组,这种方法对于记忆也应该没问题。您可以通过技巧节省一些空间,例如:将每个元组的单个副本保存在数组中并将索引存储在map / set中以表示键和值。使用空闲列表管理数组插槽。
为了使这个具体,这里是伪代码。我将使用尖括号# Definitions
For a query Q <k2,k1,k0> where each of k_i is either * or an integer,
Let I(Q) be a 3-digit binary number b2|b1|b0 where
b_i=0 if k_i is * and 1 if k_i is an integer.
Let N(i) be the number of 1's in the binary representation of i
Let M(i) be a multimap taking a tuple with N(i) elements to a set
of tuples with 3 elements.
Let t be a 3 element tuple. Then T(t,i) returns a new tuple with
only the elements of t in positions where i has a 1. For example
T(<1,2,3>,0) = <> and T(<1,2,3>,6) = <2,3>
Note that function T works fine on query tuples with wildcards.
# Algorithm to insert tuple T into the database:
fun insert(t)
for i = 0 to 7
add the entry T(t,i)->t to M(i)
# Algorithm to delete tuple T from the database:
fun delete(t)
for i = 0 to 7
delete the entry T(t,i)->t from M(i)
# Query algorithm
fun query(Q)
let i = I(Q)
return M(i).lookup(T(Q, i)) # lookup failure returns empty set
来表示元组,以避免太多的问题:
M(0)
请注意,为简单起见,我没有显示&#34;优化&#34;适用于M(7)
和M(0)
。对于i=0
,上面的算法将创建一个多图,将空元组取为数据库中所有3元组的集合。您可以通过将M(7)
视为特例来避免这种情况。类似地,fun insert(t)
for i = 1 to 6
add the entry T(t,i)->t to M(i)
add t to set S
fun delete(t)
for i = 1 to 6
delete the entry T(t,i)->t from M(i)
remove t from set S
fun query(Q)
let i = I(Q)
if i = 0, return S
elsif i = 7 return if Q\in S { Q } else {}
else return M(i).lookup(T(Q, i))
会将每个元组带到仅包含自身的集合中。
&#34;优化&#34;版本:
package hacking;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
public class Hacking {
public static void main(String [] args) {
TupleDatabase db = new TupleDatabase();
int n = 200000;
long start = System.nanoTime();
for (int i = 0; i < n; ++i) {
db.insert(db.randomTriple());
}
long stop = System.nanoTime();
double elapsedSec = (stop - start) * 1e-9;
System.out.println("Inserted " + n + " tuples in " + elapsedSec
+ " seconds (" + (elapsedSec / n * 1000.0) + "ms per insert).");
Scanner in = new Scanner(System.in);
for (;;) {
System.out.print("Query: ");
int a = in.nextInt();
int b = in.nextInt();
int c = in.nextInt();
System.out.println(db.query(new Tuple(a, b, c)));
}
}
}
class Tuple {
static final int [] N_ONES = new int[] { 0, 1, 1, 2, 1, 2, 2, 3 };
static final int STAR = -1;
final int [] vals;
Tuple(int a, int b, int c) {
vals = new int[] { a, b, c };
}
Tuple(Tuple t, int code) {
vals = new int[N_ONES[code]];
int m = 0;
for (int k = 0; k < 3; ++k) {
if (((1 << k) & code) > 0) {
vals[m++] = t.vals[k];
}
}
}
@Override
public boolean equals(Object other) {
if (other instanceof Tuple) {
Tuple triple = (Tuple) other;
return Arrays.equals(this.vals, triple.vals);
}
return false;
}
@Override
public int hashCode() {
return Arrays.hashCode(this.vals);
}
@Override
public String toString() {
return Arrays.toString(vals);
}
int code() {
int c = 0;
for (int k = 0; k < 3; k++) {
if (vals[k] != STAR) {
c |= (1 << k);
}
}
return c;
}
Set<Tuple> setOf() {
Set<Tuple> s = new HashSet<>();
s.add(this);
return s;
}
}
class Multimap extends HashMap<Tuple, Set<Tuple>> {
@Override
public Set<Tuple> get(Object key) {
Set<Tuple> r = super.get(key);
return r == null ? Collections.<Tuple>emptySet() : r;
}
void put(Tuple key, Tuple value) {
if (containsKey(key)) {
super.get(key).add(value);
} else {
super.put(key, value.setOf());
}
}
void remove(Tuple key, Tuple value) {
Set<Tuple> set = super.get(key);
set.remove(value);
if (set.isEmpty()) {
super.remove(key);
}
}
}
class TupleDatabase {
final Set<Tuple> set;
final Multimap [] maps;
TupleDatabase() {
set = new HashSet<>();
maps = new Multimap[7];
for (int i = 1; i < 7; i++) {
maps[i] = new Multimap();
}
}
void insert(Tuple t) {
set.add(t);
for (int i = 1; i < 7; i++) {
maps[i].put(new Tuple(t, i), t);
}
}
void delete(Tuple t) {
set.remove(t);
for (int i = 1; i < 7; i++) {
maps[i].remove(new Tuple(t, i), t);
}
}
Set<Tuple> query(Tuple q) {
int c = q.code();
switch (c) {
case 0: return set;
case 7: return set.contains(q) ? q.setOf() : Collections.<Tuple>emptySet();
default: return maps[c].get(new Tuple(q, c));
}
}
Random gen = new Random();
int randPositive() {
return gen.nextInt(1000);
}
Tuple randomTriple() {
return new Tuple(randPositive(), randPositive(), randPositive());
}
}
<强>加成强>
为了好玩,Java实现:
Inserted 200000 tuples in 2.981607358 seconds (0.014908036790000002ms per insert).
Query: -1 -1 -1
[[504, 296, 987], [500, 446, 184], [499, 482, 16], [488, 823, 40], ...
Query: 500 446 -1
[[500, 446, 184], [500, 446, 762]]
Query: -1 -1 500
[[297, 56, 500], [848, 185, 500], [556, 351, 500], [779, 986, 500], [935, 279, 500], ...
一些输出:
{{1}}
答案 1 :(得分:0)
如果您将元组视为IP地址,那么基数树(trie)类型结构可能会起作用。基数树用于IP发现。
另一种方法可能是计算使用位操作并为元组计算位哈希,并在搜索中执行位(或,和)以便快速发现。