如何通过Sphinx正确搜索数字?

时间:2018-07-03 08:38:40

标签: sphinx

我需要在MySQL中的十亿条记录上进行搜索,这是一个非常漫长的过程(现在可以正常工作)。狮身人面像可以帮助我吗?如何正确为搜索号码配置Sphinx?我应该使用整数属性进行搜索(不是字符串字段)吗?

我只需要获取时间戳“最近或等于”的行即可:

[info 2018/07/04 10:53:07.275 BST IsGeode <Function Execution Processor1> tid=0x48] Exception occurred:
java.lang.IllegalStateException: Unknown pdx type=1318971
  at org.apache.geode.internal.InternalDataSerializer.readPdxSerializable(InternalDataSerializer.java:3042)
  at org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2859)
  at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2961)
  at org.apache.geode.internal.util.BlobHelper.deserializeBlob(BlobHelper.java:90)
  at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1911)
  at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1904)
  at org.apache.geode.internal.cache.PreferBytesCachedDeserializable.getDeserializedValue(PreferBytesCachedDeserializable.java:73)
  at org.apache.geode.internal.cache.LocalRegion.getDeserialized(LocalRegion.java:1269)
  at org.apache.geode.internal.cache.LocalRegion$NonTXEntry.getValue(LocalRegion.java:8771)
  at org.apache.geode.internal.cache.EntriesSet$EntriesIterator.moveNext(EntriesSet.java:179)
  at org.apache.geode.internal.cache.EntriesSet$EntriesIterator.next(EntriesSet.java:134)
  at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:837)
  at org.apache.geode.cache.query.internal.CompiledSelect.doIterationEvaluate(CompiledSelect.java:699)
  at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:423)
  at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53)
  at org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:558)
  at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:385)
  at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:319)
  at org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:247)
  at org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:202)
  at org.apache.geode.management.internal.cli.functions.DataCommandFunction.execute(DataCommandFunction.java:147)
  at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185)
  at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374)
  at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662)
  at org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108)
  at java.lang.Thread.run(Thread.java:748)

(4000万个这样的行...所有时间戳都是唯一的,因此此列是唯一的索引,因此我想不必创建其他索引。)

sphinx.conf:

CREATE TABLE test ( date TIMESTAMP(6) UNIQUE, num INT(32) );
| 2018-07-02 05:50:33.084011 |  282 |
| 2018-07-02 05:50:33.084028 |  475 |
...

索引器...

source src1
{
type = mysql
...        
sql_query = SELECT * FROM test
}

在测试中,我找到了最接近的时间戳查询:

Sphinx 3.0.3
...
indexing index 'test'...
collected 40000000 docs, 0.0 MB

输出:

$start = microtime(true);
$query = '2018-07-02 05:50:33.084011';
$connMySQL = new PDO('mysql:host=localhost;dbname=test','','');
$sql = "SELECT * FROM test WHERE date <= '$search' ORDER BY date DESC LIMIT 1";
$que  = $connMySQL->query($sql);
$result = $que->fetchAll(PDO::FETCH_ASSOC);
$query  = $connMySQL->query('reset query cache');
$connMySQL = null;
print_r ($result);
echo 'Time MySQL:'.(microtime(true) - $start).' sec.';

$start = microtime(true);
$query = '2018-07-02 05:50:33.084029';
$connSphinxQL = new PDO('mysql:host=localhost;port=9306;dbname=test','root','');
$sql = "SELECT * FROM test WHERE date <= '$search' ORDER BY date DESC LIMIT 1";
$que  = $connSphinxQL->query($sql);
$result = $que->fetchAll(PDO::FETCH_ASSOC);
$query  = $connSphinxQL->query('reset query cache');
$connSphinxQL = null;
print_r ($result);
echo 'Time Sphinx:'.(microtime(true) - $start).' sec.';

我建议看一些不同的结果,但是注意到在建立索引之前,我得到了相同的结果,所以我认为Sphinx由于配置错误而直接在MySQL中搜索。 只问在这里我发现:no text search

1 个答案:

答案 0 :(得分:0)

  

我应该使用整数属性进行搜索(不是字符串字段)吗?

是的。但更复杂的是,索引至少需要一个字段(sphinx并不是真正设计为通用数据库,它旨在用于文本查询!)

可以合成一个假的。

sql_query = SELECT unix_timestamp(`date`) AS id, 'a' AS field, num FROM test
sql_attr_uint = num

还显示需要一个唯一的整数作为 first 列,作为document_id,似乎您的时间戳是唯一的,可以使用它。 UNIX_TIMESTAMP是将时间戳表示为纯整数的一种好方法。

也可以在查询中使用id进行过滤,因此需要同时转换为时间戳。

$query = '2018-07-02 05:50:33.084011';
$id = strtotime($query)
$sql = "SELECT * FROM test WHERE id <= '$id' ORDER BY id DESC LIMIT 1";