Question

(def grammar
        "
        <root> = line*
        <line> = START REST
        <START> = #'[0-9]{4} '
        <REST> = NA | NZ | DATETIME
        <NA> = 'Nicht angemeldet '
        <NZ> = 'Nicht zugelassen '
        <DATETIME> = TAG ZEIT 
        <TAG> = 'Montag ' | 'Dienstag '
        <ZEIT> = #'[0-9]{1}.*Uhr '
        ")

(defn changed-x [tree]
  (postwalk
    (fn [node]
      (if (and (vector? node) (= (first node) :line))
        [:line (-> node rest 10)]
        node))
    tree))

(defn -main
  [& args]
;; Create the tree and save it to "tree" with a function (not included here) 
but it works
      (def tree (test-title-parser title-grammar-1 In))
;; Change the tree so every line just becomes "10" hardcoded in the changed-x function
      (changed-x tree)
;; print the tree
      (println tree)
    )

这是一个要解析的（小）测试字符串： 1017 Montag 13-14：30Uhr 1026 Nicht zugelassen

i want this to happen:
1017 Montag 13-14:30 Uhr 
1026 Nicht zugelassen

每行末尾只有一个小CR-Enter，或打印到控制台，以便我可以将输出重定向到文件。我希望在数字和文本之间有标签。所以我可以将结果粘贴到Excel中，每行有两个单独的字段。

my tree Looks like this 
(1017  Montag  13-14:30Uhr 1026 Nicht zugelassen .................. )

我把＆lt;＆gt;所有我不需要的东西。

现在请为我解决这个该死的谜题的最后一点（乞讨），因为我花了几个小时理解instaparse及其工作方式，只是为了发现它确实正确分离了我的字符串但是让我0％接近什么我真的很想。只要一个疯狂的一周，就必须取得某种成功。 srsly ...我可以在几分钟内用4种不同的语言做到这一点，一个该死的循环和一个变量与字符串是我需要的。

我试图理解你的功能：怎么读？究竟是什么节点？我放在语法左侧的所有东西？什么 - ＆gt;在这做？从未见过用过这种方式，为什么我们有[] -Brackets？最后一个节点做了什么？

  (if (and (vector? node) (= (first node) :line))
    [:line (-> node rest 10)]
    node))

Answer 1

如果没有完整的语法，很难提供解释。让我们说，它的定义如下：

(def xyz
  (insta/parse
    "S = A+
     A = X Y Z
     X = 'x'+
     Y = 'y'+
     Z = 'z'+"))

它基本上匹配正则表达式#"(x+y+z+)+"。现在让我们尝试从输入创建解析树：

(def t (xyz "xyyzzzxxxyyz"))    
t ; => [:S [:A [:X "x"] [:Y "y" "y"] [:Z "z" "z" "z"]] [:A [:X "x" "x" "x"] [:Y "y" "y"] [:Z "z"]]]

问题分为两部分 - 如何修改此树以及如何在Enlive中使用它。

活跃的部分答案

Enlive是Clojure的基于选择器的模板库。要将此树用于模板化，您需要将键:S，:A，:X，:Y，:Z重命名为某些标记。我们将它们分别替换为:div，:p，:h1，:h2和:h3。对于这种密钥重命名，有postwalk-replace函数：

(use 'clojure.walk)

(def tags (postwalk-replace {:S :div :A :span :X :h1 :Y :h2 :Z :h3} t))
tags ; => [:div [:span [:h1 "x"] [:h2 "y" "y"] [:h3 "z" "z" "z"]] [:span [:h1 "x" "x" "x"] [:h2 "y" "y"] [:h3 "z"]]]

tags向量已准备好在Enlive中使用：

(use 'net.cgrand.enlive-html)

(html tags) ; => ({:tag :div, :attrs {}, :content ({:tag :span, :attrs {}, :content ({:tag :h1, :attrs {}, :content ("x")} {:tag :h2, :attrs {}, :content ("y" "y")} {:tag :h3, :attrs {}, :content ("z" "z" "z")})} {:tag :span, :attrs {}, :content ({:tag :h1, :attrs {}, :content ("x" "x" "x")} {:tag :h2, :attrs {}, :content ("y" "y")} {:tag :h3, :attrs {}, :content ("z")})})})

树的修改部分答案

要修改:X个节点的树值，可以使用postwalk函数：

(defn changed-x [tree f]
  (postwalk
    (fn [node]
      (if (and (vector? node) (= (first node) :X))
        [:X (-> node rest f)]
        node))
    tree))

(changed-x t count) ; => [:S [:A [:X 1] [:Y "y" "y"] [:Z "z" "z" "z"]] [:A [:X 3] [:Y "y" "y"] [:Z "z"]]]

在上面的示例中，对于所有:X个节点，子节点（多个字符串"x"）由单个数字更改 - 它们的计数。最后，如果要丢弃:X以外的所有节点，可以使用tree-seq函数：

(defn filter-by-key [tree node-key]
  (->> tree
       (tree-seq vector? identity)
       (filter #(and
                  (vector? %)
                  (= (first %) node-key)))))

(filter-by-key t :X) ; => ([:X "x"] [:X "x" "x" "x"])

Answer 2

实际上解决方案非常接近。主要问题在于语法。您已为每个非终端符号添加<>。因为所有语义都丢失了，而树("1017 " "Montag " "13-14:30Uhr " "1026 " "Nicht zugelassen ")只包含终端节点。

考虑下一个语法：

(def grammar
    "
    <root> = line*
    line = START REST
    <START> = #'[0-9]{4} '
    <REST> = NA | NZ | DATETIME
    <NA> = 'Nicht angemeldet '
    <NZ> = 'Nicht zugelassen '
    <DATETIME> = TAG ZEIT 
    <TAG> = 'Montag ' | 'Dienstag '
    <ZEIT> = #'[0-9]{1}.*Uhr '
    ") ; Note "line" non-terminal - it's not wrapped now

结果，树看起来像([:line "1017 " "Montag " "13-14:30Uhr "] [:line "1026 " "Nicht zugelassen "])。现在，changed-x的稍微更通用的版本：

(defn tree-apply [tree tree-key f]
  (postwalk
    (fn [node]
      (if (and (vector? node) (= (first node) tree-key))
        [tree-key (-> node rest f)]
        node))
    tree))

基本上，硬编码的:X关键字已更改为参数，并且功能已重命名为更有意义的名称。

最后，用法将打印以制表符分隔的术语：

(require '[clojure.string :as s])

(tree-apply t :line #(->> % (s/join "\t") println))
1017    Montag  13-14:30Uhr 
1026    Nicht zugelassen

一些解释。 tree-apply是返回由tree-key给出的已更改节点的树的更新版本的函数。它使用的格式是node是一个向量，第一个元素是node key，rest - 树的叶子或子节点：

[:a [:b "c"] "d" [:e]]

此处:a，:b，:e是节点密钥; :b，:e是:a的孩子; "c"，"d"是叶子。对于下一棵树

[:root [:a] [:a "a"] [:b] [:a 1] [:c ...]]

(tree-apply t :a f)会考虑节点[:a]，[:a "a"]和[:a 1]未触及:b和:c的帐户。函数f作为节点的参数“内部”。第一个节点()为[:a]（空序列），("a")为[:a "a"]，(1)为[:a 1]。 f的结果将放置到新构建的树中，因此生成的树将如下所示：

[:root [:a (f)] [:a (f '("a"))] [:b] [:a (f '(1))] [:c ...]]

此功能可以提供上述示例中的副作用。功能

#(->> % (s/join "\t") println)

是

的捷径

(fn [coll] (println (s/join "\t" coll)))

基本上，它接受序列，将它连接到由制表符分隔的字符串，并在新行中打印结果字符串。

通过Clojure Instaparse Tree迭代

2 个答案:

活跃的部分答案

树的修改部分答案