在Marklogic服务器上的文本文档中搜索,并基于搜索Patten想要结果

时间:2019-01-09 11:17:04

标签: xquery marklogic-8

我已经在Marklogic服务器中上传了一个文本文件,其名称为collections(“ calling-returning”)。 以下是文本文档:

    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,884 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -7703835814759006134 - Returning from WorkflowContentDao.deleteCompletedOrFailedContentList(..) Execution time: 16 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,900 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2561765194895194936 - Calling WorkflowContentDao.getWaitingForContentListToProcess(..) with parameters FTP
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,900 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2561765194895194936 - Returning from WorkflowContentDao.getWaitingForContentListToProcess(..) Execution time: 0 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2041334620910360341 - Calling WorkflowContentDao.getFTPWaitProcessType(..) with parameters ftp://10.103.100.43:21/VARIANTGENERATION/INPUT/30357186.pdf
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2041334620910360341 - Returning from WorkflowContentDao.getFTPWaitProcessType(..) Execution time: 0 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.consumer.WorkflowContentConsumer processWorkflowContent - processWorkflowContent workflow content task: DPC-CENELEC-PUBLISH 01-7915592210 VARIANT_GENERATION
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.schedule.task.ProcessWorkflowContent failWorkflowContentTask - Failing workflow content task using scheduler because its exceeded 30 min since created  DPC-CENELEC-PUBLISH 01-7915592210 VARIANT_GENERATION
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,931 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : 8235148762900748472 - Calling WorkflowContentDao.setPickedBy(..) with parameters com.innodata.bsi.domain.WorkflowContentInfo@5f7839bd
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,931 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : 8235148762900748472 - Returning from WorkflowContentDao.setPickedBy(..) Execution time: 0 ms

我正在此文档“ 2561765194895194936-正在呼叫”中搜索,号码可以是任何数字。 所以我写了以下查询:

 let $search :=cts:search(collection("calling-returning"), cts:word-query(" - 
 Calling"))
 return $search

但是它返回完整的文档。我只想要以下结果类型:

  2561765194895194936 - Calling
  256176519489514568 - Calling
  568651948951566 - Calling

1 个答案:

答案 0 :(得分:1)

MarkLogic中的搜索和检索单元是一个文档。如果要分别搜索行,则它们必须是单独的文档。有了匹配的文档后,如果要从中拉出匹配的行,则需要将文档标记成几行,然后在每一行上运行匹配项,例如tokenize($doc,"\n")[cts:contains(text {.}, $query)]

那将不会非常有效,您最好对文本文档进行预处理以添加一些标记(例如,每行的根元素和行元素),然后至少不必这样做整个事物的标记化,尽管事实之后,您仍然需要遍历与每一行匹配的整个事物:$doc//line[cts:contains(., $query)]