如何在Clojure中添加句号?

时间:2015-03-02 08:21:26

标签: clojure

我想将句号添加到文本文件中: 把[1] [2] [3] ......放在每个句子的前面。

[1] Sentence one. [2] Sentence two. ...

一句话以.!?之一结束。

我不清楚在Clojure.中如何做到这一点。这是我的尝试:

(def text "Martin Luther King, Jr.

I Have a Dream

delivered 28 August 1963, at the Lincoln Memorial, Washington D.C.


I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.

Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity.

But one hundred years later, the Negro still is not free. One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. One hundred years later, the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later, the Negro is still languished in the corners of American society and finds himself an exile in his own land. And so we've come here today to dramatize a shameful condition.")

定义句子结尾:

(def sentence-ending #"[.!?]")

使用替换功能:

(require '[clojure.string :as str])
(str/replace text sentence-ending "[number]")   

我知道这在逻辑上是错误的!我用字符串替换了所有.!?。也许字符串替换不是正确的方法。如何解决这个问题?

2 个答案:

答案 0 :(得分:3)

您可以将text拆分为句子序列。然后map每个句子加上[number],并再次加入句子以制作一个字符串。

(->> (clojure.string/split text #"[.?!]")       ; split text
     (map-indexed #(str "[" (inc %1) "] " %2))  ; prepend number
     (apply str))                               ; join to one string

但是将文本拆分成字符串的条件是天真的。如您所见,某些单词包含.,这些不是句子的结尾。你应该改进句子终止条件。

答案 1 :(得分:0)

获得完整句子(包括标点符号)的一种方法是对整个事物进行正则表达式并使用匹配器。我不知道这是不是最好的方法。但它确实有效。

之后,我认为interleave可以很好地解决这类问题。

(let [matcher (re-matcher #"[^.!?]*[.!?]" text)
      sentences (take-while seq (repeatedly #(re-find matcher)))
      numbers (map #(str "[" % "] ") (range))]
  (apply str (interleave numbers sentences)))