我想在文本中找到单词的出现次数。 我有一个这样的课程
public class Page
{
public string Id { get; set; }
public string BookId { get; set; }
public string Content { get; set; }
public int PageNumber { get; set; }
}
我的索引是这样的:
class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult>
{
public class ReduceResult
{
public string PageId { get; set; }
public int Count { get; set; }
public string Word { get; set; }
public string Content { get; set; }
}
public Pages_SearchOccurrence()
{
Map = pages => from page in pages
let words = page.Content
.ToLower()
.Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries)
from w in words
select new
{
page.Content,
PageId = page.Id,
Count = 1,
Word = w
};
Reduce = results => from result in results
group result by new { PageId = result.PageId, result.Word } into g
select new
{
Content = g.First().Content,
PageId = g.Key.PageId,
Word = g.Key.Word,
Count = g.ToList().Count()
};
Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
}
}
最后,我的查询是这样的:
using (var session = documentStore.OpenSession())
{
RavenQueryStatistics stats;
var occurence = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>()
.Statistics(out stats)
.Where(x => x.Word == "works")
.ToList();
}
但我意识到RavenDb很慢(或者我的查询不好) stats.IsStale = true和raven studio花费太多时间并且只给出很少的结果。 我有1000个文档“Pages”,每页有1000个单词的内容。 为什么我的查询不合适?如何在页面中找到事件? 谢谢你的帮助!
答案 0 :(得分:0)
你做错了。您应将Content字段设置为Analyzed并使用RavenDB的Search()运算符。缓慢很可能是因为索引代码正在进行的未优化工作量。
答案 1 :(得分:0)
我找到了部分结果。
也许我不清楚:我的目标是找到页面中出现的单词。 我搜索页面中某个单词的点击次数,我想按此计数。
我改变了我的索引:
class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult>{
public class ReduceResult
{
public string Content { get; set; }
public string PageId { get; set; }
public string Count { get; set; }
public string Word { get; set; }
}
public Pages_SearchOccurrence()
{
Map = pages => from page in pages
let words = page.Content.ToLower().Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries)
from w in words
select new
{
page.Content,
PageId = page.Id,
Count = 1,
Word = w
};
Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
Index(x => x.PageId, Raven.Abstractions.Indexing.FieldIndexing.NotAnalyzed);
}
最后,我的新查询如下所示:
using (var session = documentStore.OpenSession())
{
var query = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>()
.Search((x) => x.Word, "works")
.AggregateBy(x => x.PageId)
.CountOn(x => x.Count)
.ToList()
.Results
.FirstOrDefault();
var listFacetValues = query.Value.Values;
var finalResult = listFacetValues.GroupBy(x => x.Hits).OrderByDescending(x => x.Key).Take(5).ToList();
}
finalResult 为我提供了一组 Facetvalue ,其中包含属性 Hits
(我的 FacetValue 的属性 Hits 和 Count 在这里是相同的)
Hits 属性为我提供了我想要的结果,但对我来说这段代码不正确,ravendb studio也不喜欢这样。
你有更好的解决方案吗?