下面的问题:
SOLR中的数据结构:
<field name="id" type="string" required="true"/>
<field name="session_id" type="string" required="true"/>
<field name="action_type" required="true"/>
<field name="error_msg" required="false"/>
(所有字段都有:indexed =“true”stored =“true”multiValued =“false”) 只需要'错误'字段(可以为空)。
oracle中有等效表:
TABLE SOLR_TEST
(
ID NUMBER NOT NULL ,
SESSION_ID VARCHAR2(20 BYTE) NOT NULL ,
ACTION_TYPE VARCHAR2(20 BYTE) NOT NULL ,
ERROR_MSG VARCHAR2(20 BYTE)
);
有样本数据(SOLR和Oracle相同)
ID SESSION_ID ACTION_TYPE ERROR_MSG
-- -------------------- -------------------- --------------------
1 00001 SELECTED_ACTION
2 00001 SELECTED_ACTION
3 00001 OTHER
4 00002 A2 ERROR_001
5 00002 OTHER
6 00003 SELECTED_ACTION ERROR_002
7 00004 A1 ERROR_001
8 00005 A2
9 00005 SELECTED_ACTION
10 00005 SELECTED_ACTION ERROR_003
11 00006 SELECTED_ACTION
12 00006 OTHER ERROR_004
问题:
如何在SOLR查询中创建将返回:
所有session_id
已指定action_type
但永远不会发生指定action_type
非空error_msg
或Oracle中此查询的等效内容:
select distinct session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and not session_id in
( select session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and error_msg is not null
);
此查询的结果是:
SESSION_ID
--------------------
00001
00006
e.g。像这样的SOLR查询不正在工作:
http://solrhost/solr/collection/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION
//编辑/////////////////////////////////////
真正的架构看起来像这样:
<schema name="elogging" version="1.5">
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="action_type" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="session_id" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="error_msg" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
</types>
<updateRequestProcessorChain name="uniq-fields">
<processor class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory">
<lst name="fields">
<str>id</str>
</lst>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
</schema>
//编辑2 //////////////////////
SOLR查询无法正常工作 - 此SOLR查询返回类似于:
的内容select distinct session_id
from SOLR_TEST
where action_type='SELECTED_ACTION'
and error_msg is null;
SESSION_ID
--------------------
00001
00005
00006
值'00005'错误,因为有一行:
10 00005 SELECTED_ACTION ERROR_003
//编辑3 ////////////
此SOLR查询也无效(与之前相同的问题):
http://solrhost/solr/collection/select?rows=1&q=action_type:SELECTED_ACTION+AND+-{!join+from=session_id+to=session_id}error_msg:*+AND+action_type:SELECTED_ACTION&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false
//编辑4 ///////
*修复架构 - 'error_msg'已编入索引*
//编辑5 /////
您有SOLR的样本数据:
id,session_id,action_type,error_msg
1,00001,SELECTED_ACTION,
2,00001,SELECTED_ACTION,
3,00001,OTHER,
4,00002,A2,ERROR_001
5,00002,OTHER,
6,00003,SELECTED_ACTION,ERROR_002
7,00004,A1,ERROR_001
8,00005,A2,
9,00005,SELECTED_ACTION,
10,00005,SELECTED_ACTION,ERROR_003
11,00006,SELECTED_ACTION,
12,00006,OTHER,ERROR_004
和
SOLR对此数据和查询的结果http://localhost:8983/solr/collection3/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION
:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">30</int>
<lst name="params">
<str name="facet.zeros">false</str>
<str name="facet">true</str>
<str name="indent">true</str>
<str name="q">
-(error_msg:[* TO *] AND action_type:SELECTED_ACTION)
</str>
<str name="facet.field">session_id</str>
<str name="wt">xml</str>
<str name="fq">action_type:SELECTED_ACTION</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="4" start="0">
<doc>
<str name="id">1</str>
<str name="session_id">00001</str>
<str name="action_type">SELECTED_ACTION</str>
<long name="_version_">1449881246216749056</long>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="session_id">
<int name="00001">2</int>
<int name="00005">1</int>
<int name="00006">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
答案 0 :(得分:0)
这有点棘手,因为据我所知(如果有人能证明这是错误的话我会很高兴) - 在另一个查询中重用部分的查询结果是不可能的(例如过滤查询或嵌套查询。)
所以,这是我目前所能得到的:
<强>查询强>:
http://localhost:8983/solr/stack19588325/select?q=action_type%3A%22SELECTED_ACTION%22&fq=%7B!tag%3Ddt%7Daction_type%3ASELECTED_ACTION+AND+error_msg%3A%5B*+TO+*%5D+AND+_query_%3A%7B!join+from%3Dsession_id+to%3Dsession_id+v%3D%24qq%7D&rows=0&wt=xml&indent=true&facet=true&facet.mincount=1&facet.field={!ex=dt%20key=nonfilter_session_id}session_id&facet.field=session_id&qq=-error_msg:[*%20TO%20*]
<强>结果强>:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="qq">-error_msg:[* TO *]</str>
<str name="q">action_type:"SELECTED_ACTION"</str>
<arr name="facet.field">
<str>{!ex=dt key=nonfilter_session_id}session_id</str>
<str>session_id</str>
</arr>
<str name="indent">true</str>
<str name="fq">{!tag=dt}action_type:SELECTED_ACTION AND error_msg:[* TO *] AND _query_:{!join from=session_id to=session_id v=$qq}</str>
<str name="facet.mincount">1</str>
<str name="rows">0</str>
<str name="wt">xml</str>
<str name="facet">true</str>
<str name="_">1382878844535</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="nonfilter_session_id">
<int name="00001">2</int>
<int name="00005">2</int>
<int name="00003">1</int>
<int name="00006">1</int>
</lst>
<lst name="session_id">
<int name="00005">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
因此,如您所见,我们有两个不同的方面结果:
所以,如果没有更好的选择 - 你可以建立这两个集合的交集,并且只会有那些期望的session_id。