SOLR如何限制solr查询中的搜索内容

时间:2016-06-01 10:40:28

标签: solr

我想使用solr查询搜索到特定行的单词而不是超出该单词。我尝试了邻近匹配,但它没有奏效。我的数据就像

  

Blockquote“日期:星期四,2014年7月24日09:36:44 GMT \ nCache-Control:private \ nContent-Type:application / json; charset = utf-8 \ nContent-Encoding:gzip \ nVary:Accept-Encoding \ nP3P:CP =%20CURo TAIo IVAo IVDo ONL UNI COM NAV INT DEM STA OUR%20 \ nX-Powered-By:ASP.NET \ nContent-Length:570 \ nKeep-Alive:timeout = 120 \ nConnection:Keep-Alive \ n \ n [{%20rows%20:[],%20index%20:[],%20folders%20:[[%20Inbox%20,%20Inbox%20,%20%20,1,1,0, 0,0,%20Inbox%20,0,0,%20none%20,0],[%20Drafts%20,%20Drafts%20,%20%20,1,1,0,0,0,%20Drafts% 20,0,0,%20none%20,0],[%20Sent%20,%20Sent%20,%20%20,1,1,0,0,11,%20Sent%20,1,0,% 20none%20,0],[%20Spam%20,%20Spam%20,%20%20,1,1,0,0,0,%20Spam%20,1,0,%20none%20,0], [%20Deleted%20,%20Trash%20,%20%20,1,1,0,7,9,%20Deleted%20,1,0,%20none%20,0],[%20Saved%20,% 20Saved Mail%20,%20%20,1,1,0,0,0,%20Saved%20,1,0,%20none%20,0],[%20SavedIMs%20,%20Saved Chats%20,% 20Saved%20,2,1,0,0,0,%20SavedIMs%20,1,0,%20none%20,0],%20fcsupport%20:真,%20hasNewMsg%20:假,%20totalItems%20 :0,%20isSuccess%20:真,%20F oldersCanMoveTo%20:[%20Sent%20,%20Spam%20,%20Deleted%20,%20Saved%20,%20SavedIMs%20],%20indexStart%20:0}] POST / 38664-816 / aol-6 / en -us / common / rpc / RPC.aspx?user = hl1lkgReIh& transport = xmlhttp& r = 0.019667088333411797& a = GetMessageList& l = 31211 HTTP / 1.1 \ nHost:mail.aol.com \ nUser-Agent:Mozilla / 5.0 (Windows NT 5.1; rv:31.0)Gecko / 20100101 Firefox / 31.0 \ n接受:text / html,application / xhtml + xml,application / xml; q = 0.9, / ; q = 0.8 \ nAccept-Language:en- US,en; q = 0.5 \ nAccept-Encoding:gzip,deflate \ nContent-Type:application / x-www-form-urlencoded; charset = UTF-8 \ nX-Requested-With:XMLHttpRequest \ nReferer:http://mail.aol.com/38664-816/aol-6/en-us/Suite.aspx \ n内容长度:452 \ nCookie:mbox = PC#1405514778803-136292.22_06#1407395182 | session#1406185366924-436868#1406187442 | check #真#1406185642; s_pers =%20s_fid%3D55C638B5F089E6FB-19ACDEED1644FD86%7C1469344726539%3B%20s_getnr%3D1406186326569-重复%7C1469258326569%3B%20s_nrgvo%3DRepeat%7C1469258326571%3B; s_vi = [CS] V1 | 29E33A0D051D366F-60000105200097FF [CE]; UNAUTHID = 1.5efb4a11934a40b8b5272557263dadfe.88c5; RSP_COOKIE =类型= 30&安培;名称= YWxzaGFraWIyMDE0&安培; SN = MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D&安培; S类型= 0&安培; AGRP = M; LTState =版本:5安培; LAV:22安培;取消:* UQo5AwAnAytffwJSYg%3D%3D&放大器; SN:* UQo5AwAnAytffwJSYg%3D%3D&放大器; UV:AOL和放大器; LC:EN-US&放大器; UD:aol.com和放大器; EA:* UQo5AwAnAytffwJSCAsnWWoJASZL&安培; PRMC :825345&安培; MT:6和; AMS:1和; CMAI:365安培; SNT:0&安培; vnop:假&安培; MH:core-mia002b.r1000.mail.aol.com&安培峰; br:100安培; WM:mail.aol.com&安培; CKD :.mail.aol.com&安培; CKP:%2F&安培;公顷:1NGRuUTRRxGFF2s5A4JwkuCT43Q%3D&安培;; aolweatherlocation = 10003;数据层=缺点%3D6.107%26coms%3D629; grvinsights = 69f3a2bb86ed3cd31aa1d14a1ce9e845; CUNAUTHID = 1.5efb4a11934a40b8b5272557263dadfe.88c5; s_sess =%20s_cc%3Dtrue%3B%20s_sq%3Daolcmp%253D%252526pid%25253Dcmp%2525253A%25252520Help%25252520%2525257C%25252520View%25252520Article%2525253A%25252520Clear%25252520cookies%2525252C%25252520cache%2525252C%25252520history%25252520and%25252520footprints%252526pidt %25253D1%252526oid%25253Dhttp%2525253A%2525252F%2525252Fwebmail.aol.com%2525252F%2525253F_AOLLOCAL%2525253Dmail%252526ot%25253DA%2526aolsnssignin%253D%252526pid%25253Dsso%25252520%2525253A%25252520login%252526pidt%25253D1%252526oid%25253DSign%25252520In %252526oidt%25253D3%252526ot%25253DSUBMIT%3B; L7Id = 31211;上下文=版本:3及SID:923f783b-bc6e-4edf-87c9-e52f19b3ce67&安培; RT:STANDARD&安培; I:F&安培; CKD:.mail.aol.com&安培; CKP:%2F&安培;公顷:X80Ku4ffRKsOVSwgmEVPCfpfxeU%3D&安培;; IDP_A = S-1- V0c3QiuO6BzQ5S6_u3s0brfUqMCktezAz7sWlVfHD90omIijDXRrMJkSM -9- xcnUcSTnXbcZ1aUCgvfuToVeJihcftKY5KtsC_nB7Y9qf6P0xUnNfCIAmWVtRf4ctSQ9JwRIzHa40dhFuULwYLu3NUPTxckeFUFAzcSS4hrmb4grhEtyOGp0qV5rIKtjs4u8; MC_CMP_ESK =无义; SNS_AA = ASRC = 2及SST = 1406185424&安培;类型= 0; _utd = GD#MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D | PR#一个| ST#sns.webmail.aol.com | UID#; AUTH =版本:22安培; UAS:* UQo5AwAnAytffwJSZAskRiwLBSIDWVpVXxVTVwJCLFxdSnpHUWBbeV1jcikERgl6CEYLJUweGUhdFQQLW1h%2bBAZRcllWfVl8VH4DUmRaZARoPhw%2bBFBA&安培; IDL:0&安培; UN:* UQo5AwAnAytffwJSYg%3D%3D&安培;在:SNS&安培; SN:* UQo5AwAnAytffwJSYg%3D%3D&安培; WIM:%252FwQCAAAAAAAEk2ihy%252BE4MMebm4R1jvxY07zNZhFOHSz2EFBnsNdOAUsl8QyZceo54kWYZ4vwVayLFF7w&安培;麦粒肿: 0安培; UD:aol.com&安培; UID:hl1lkgReIh&安培; SS:635417678271359104&安培; SVS:SNS_AA%7c1406185424&安培; LA:635417687268954835&安培; AAT:A和行为:M&安培峰; br:100安培; CBR:AOL&安培; MT:&安培;工资:0&安培; mbt:G& uv:AOL& lc:en-us& bid:1& acd:1403348988& pix:3829& prmc:825345& relm:aol& mah:%2 \ nConnection:keep-alive \ n“

并希望从数据中搜索Content-Type:application / json,而不是超出此行。我试过了

  

http://192.168.0.164:8983/solr/collection_with_all_details/select?q=Content%3A 内容类型 JSON *安培;重量= JSON&安培;缩进=真

但它搜索整个内容。我需要限制搜索内容

2 个答案:

答案 0 :(得分:0)

在这种情况下,我认为不可能。您可以检查highlighter以突出显示响应中的前200个字符。

可能你需要考虑写一个可以帮助解决这个问题的自定义响应编写器。

使用indexed="false" stored="true"创建其他字段的另一个选项将更有效。

创建原始字段indexed="true" stored="false",您的索引大小将会减少。新的复制字段为indexed="false" stored="true"

<copyField source="text" dest="textShort" maxChars="200"/>

检查这是否适合您。

答案 1 :(得分:0)

您应该真正地,真正地预处理您的数据,只是索引您将要使用的部分。事后做这件事并不是一个好的解决方案,因为你已经拥有了索引中的大部分内容,并且你正在寻找一个不在一个特定字节位置的分隔符(这是maxChars能够做到的事情。)

根据您的索引方式,您可以在索引步骤(regextransformer,在您自己的代码中使用SolrJ等)中执行此操作,或者在代码的分析步骤中执行此操作,方法是patternreplacefilter。这样您就可以在要查找的标题后删除任何内容。

这样您就可以将内容编入索引为一个header字段和一个body字段,具体取决于您的需要。