具有细粒度中文分割工具的Lucene.Net FastVectorHighlighter不起作用

时间:2018-02-05 06:05:23

标签: highlight lucene.net fast-vector-highlighter

enter image description here

错误是:

System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
   at System.String.Substring(Int32 startIndex, Int32 length)
   at Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.MakeFragment(StringBuilder buffer, Int32[] index, Field[] values, WeightedFragInfo fragInfo, String[] preTags, String[] postTags, IEncoder encoder) in C:\BuildAgent\work\b1b63ca15b99dddb\src\Lucene.Net.Highlighter\VectorHighlight\BaseFragmentsBuilder.cs:line 195
   at Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.CreateFragments(IndexReader reader, Int32 docId, String fieldName, FieldFragList fieldFragList, Int32 maxNumFragments, String[] preTags, String[] postTags, IEncoder encoder) in C:\BuildAgent\work\b1b63ca15b99dddb\src\Lucene.Net.Highlighter\VectorHighlight\BaseFragmentsBuilder.cs:line 146
   at Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.CreateFragments(IndexReader reader, Int32 docId, String fieldName, FieldFragList fieldFragList, Int32 maxNumFragments) in C:\BuildAgent\work\b1b63ca15b99dddb\src\Lucene.Net.Highlighter\VectorHighlight\BaseFragmentsBuilder.cs:line 99

这是因为资源代码:

  protected virtual string MakeFragment(StringBuilder buffer, int[] index, Field[] values, WeightedFragInfo fragInfo,
        string[] preTags, string[] postTags, IEncoder encoder)
    {
        StringBuilder fragment = new StringBuilder();
        int s = fragInfo.StartOffset;
        int[] modifiedStartOffset = { s };
        string src = GetFragmentSourceMSO(buffer, index, values, s, fragInfo.EndOffset, modifiedStartOffset);
        int srcIndex = 0;
        foreach (SubInfo subInfo in fragInfo.SubInfos)
        {
            foreach (Toffs to in subInfo.TermsOffsets)
            {
                fragment
                    .Append(encoder.EncodeText(src.Substring(srcIndex, (to.StartOffset - modifiedStartOffset[0]) - srcIndex)))
                    .Append(GetPreTag(preTags, subInfo.Seqnum))
                    .Append(encoder.EncodeText(src.Substring(to.StartOffset - modifiedStartOffset[0], (to.EndOffset - modifiedStartOffset[0]) - (to.StartOffset - modifiedStartOffset[0]))))
                    .Append(GetPostTag(postTags, subInfo.Seqnum));
                srcIndex = to.EndOffset - modifiedStartOffset[0];
            }
        }
        fragment.Append(encoder.EncodeText(src.Substring(srcIndex)));
        return fragment.ToString();
    }

细粒度分词突出显示这段代码会出错。因为这个功能可能要求分词是连续的。而细粒度分词不是连续的。我想问一下FastVectorHighlighter如何通过细粒度分词来实现高亮度。

0 个答案:

没有答案