我正在尝试对我的索引执行查询,并获得所有没有带有重力图像的审阅者的评论。为此,我实现了一个带有主机模式的PatternAnalyzerDefinition:
"^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"
应匹配并提取网址的主机,如:
https://www.gravatar.com/avatar/blablalbla?s=200&r=pg&d=mm
变为:
www.gravatar.com
映射:
clientProvider.getClient.execute {
create.index(_index).analysis(
phraseAnalyzer,
PatternAnalyzerDefinition("host_pattern", regex = "^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)")
).mappings(
"reviews" as (
.... Cool mmappings
"review" inner (
"grade" typed LongType,
"text" typed StringType index "not_analyzed",
"reviewer" inner (
"screenName" typed StringType index "not_analyzed",
"profilePicture" typed StringType analyzer "host_pattern",
"thumbPicture" typed StringType index "not_analyzed",
"points" typed LongType index "not_analyzed"
),
.... Other cool mmappings
)
) all(false)
} map { response =>
Logger.info("Create index response: {}", response)
} recover {
case t: Throwable => play.Logger.error("Error creating index: ", t)
}
查询:
val reviewQuery = (search in path)
.query(
bool(
must(
not(
termQuery("review.reviewer.profilePicture", "www.gravatar.com")
)
)
)
)
.postFilter(
bool(
must(
rangeFilter("review.grade") from 3
)
)
)
.size(size)
.sort(by field "review.created" order SortOrder.DESC)
clientProvider.getClient.execute {
reviewQuery
}.map(_.getHits.jsonToList[ReviewData])
检查映射的索引:
reviewer: {
properties: {
id: {
type: "long"
},
points: {
type: "long"
},
profilePicture: {
type: "string",
analyzer: "host_pattern"
},
screenName: {
type: "string",
index: "not_analyzed"
},
state: {
type: "string"
},
thumbPicture: {
type: "string",
index: "not_analyzed"
}
}
}
当我执行查询时,模式匹配似乎不起作用。我仍然会与拥有重力图像的评论者进行评论。 我究竟做错了什么?也许我误解了PatternAnalyzer?
我正在使用 “com.sksamuel.elastic4s”%%“elastic4s”%“1.5.9”,
答案 0 :(得分:0)
我想再一次RTFM就在这里:
docs州:
重要提示:正则表达式应与令牌分隔符匹配,而不是与令牌本身匹配。
意味着在我的情况下匹配的令牌www.gravatar.com将不会 分析该领域后的一部分代币。
而是使用Pattern Capture Token Filter
首先声明一个新的CustomAnalyzerDefinition:
val hostAnalyzer = CustomAnalyzerDefinition(
"host_analyzer",
StandardTokenizer,
PatternCaptureTokenFilter(
name = "hostFilter",
patterns = List[String]("^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"),
preserveOriginal = false
)
)
然后将分析仪添加到字段中:
"review" inner (
"reviewer" inner (
"screenName" typed StringType index "not_analyzed",
"profilePicture" typed StringType analyzer "hostAnalyzer",
"thumbPicture" typed StringType index "not_analyzed",
"points" typed LongType index "not_analyzed"
)
)
create.index(_index).analysis(
someAnalyzer,
phraseAnalyzer,
hostAnalyzer
).mappings(
瞧。有用。检查令牌和索引的一个非常好的工具是调用:
/[index]/[collection]/[id]/_termvector?fields=review.reviewer.profilePicture&pretty=true