我正在使用libpostal库在新闻文章中查找完整地址(街道,城市,州和邮政编码)。给定输入文本时的libpostal:
位于CO 10566的Main Street Boulder 5号位于威尔逊角落,发生了一起事故。
返回一个向量:
[{:label "house", :value "there was an accident at 5"}
{:label "road", :value "main street"}
{:label "city", :value "boulder"}
{:label "state", :value "co"}
{:label "postcode", :value "10566"}
{:label "road", :value "which is at the corner of wilson."}
我想知道在Clojure中是否有一种聪明的方法来提取序列中出现:label
值的序列:
[road unit? level? po_box? city state postcode? country?]
其中?
表示匹配中的可选值。
答案 0 :(得分:6)
您可以使用clojure.spec执行此操作。首先定义一些与您的地图匹配的规格' :label
值:
(defn has-label? [m label] (= label (:label m)))
(s/def ::city #(has-label? % "city"))
(s/def ::postcode #(has-label? % "postcode"))
(s/def ::state #(has-label? % "state"))
(s/def ::house #(has-label? % "house"))
(s/def ::road #(has-label? % "road"))
然后定义一个regex spec,例如s/cat
+ s/?
:
(s/def ::valid-seq
(s/cat :road ::road
:city (s/? ::city) ;; ? = zero or once
:state ::state
:zip (s/? ::postcode)))
现在您可以conform
或valid?
- 吃掉您的序列:
(s/conform ::valid-seq [{:label "road" :value "Damen"}
{:label "city" :value "Chicago"}
{:label "state" :value "IL"}])
=>
{:road {:label "road", :value "Damen"},
:city {:label "city", :value "Chicago"},
:state {:label "state", :value "IL"}}
;; this is also valid, missing an optional value in the middle
(s/conform ::valid-seq [{:label "road" :value "Damen"}
{:label "state" :value "IL"}
{:label "postcode" :value "60622"}])
=>
{:road {:label "road", :value "Damen"},
:state {:label "state", :value "IL"},
:zip {:label "postcode", :value "60622"}}