使用索引进行多个字符串替换

时间:2014-06-13 23:12:12

标签: string clojure reduce

我正在使用以下食谱配方替换文本中的非唯一子字符串:

 (defn string-splice
    "cookbook recipe: http://gettingclojure.wikidot.com/cookbook:strings
     Given three arguments, string-splice will replace a portion of the old string at the       
     given offset equal to the length  of the replacement. The resulting string will be the      
     same  length as the original. The optional fourth argument 
     specifies the length of text to be replaced. If this argument length is greater than the    
     length of the new string, then the result will be shorter than the original string."

     ([target new offset] (string-splice target new offset (count new)))
     ([target new offset length]
     (str  (subs target 0 offset)   new (subs target (+ offset length))  )   ) )

现在假设我有以下拼写错误的字符串

 (def bad-st "mary had a littl lam whose fleec was whiteas snw.")

以及以下带有相关索引的更正列表,指示在bad-st中出现拼写错误的单词的位置:

 (def corrections '(Mary 0 Little 11 fleck 27 white as 37 Snow 45))

如果我想累积地将这些修正中的每一个替换为字符串,同时还要移动字符串中的字符以适应比拼写错误的子字符串更长或更短的更正,我可以使用为a给出的缩减代码的版本related problem

 (reduce (fn [st [x y ]]
      (string-splice  st x y (count x) )) 
            bad-st
         (partition 2 corrections))

但是,这无法正确移动原始文本中的字符。输出是

 "Mary had a Littlelam whose fleck was white asSnow"

有谁能告诉我这里我做错了什么并建议修复?

1 个答案:

答案 0 :(得分:0)

使用string-splice的基本问题是你传递了错误的第四个参数,这个参数需要是被替换的子字符串的长度 - 你要通过它的长度替换。所以你需要在修正位置找到坏词的长度。

(defn wsize-at 
  "size of word (non-white sequence) at position n in string s"
  [n s]
  (let [[head tail] (split-at n s)]
    (count (take-while #(not (Character/isWhitespace %)) tail))))

使用reduce引起的问题是,如果替换字符串和替换字符串的长度不同,则会在字符串中稍后抛出索引。您可以通过向后的字符串末尾处理来解决这个问题:

(reduce (fn [st [s n]] (string-splice st s n (wsize-at n st)))
  bad-st 
  (reverse (partition 2 corrections)))

我不确定string-splice是否适合执行此任务。校正偏移量在原始字符串中;另一种方法是使用这些偏移来提取原始字符串的未更改段,例如使用函数good-parts,以便(good-parts bad-st [0 11 27 37 45])给出["有一个","林","是"," snw。"] - wsize-at将成为实施的一部分。然后你将它与[" Mary"," Little"," fleck"," white as"," Snow&#34交错;]并将str应用于结果以提供所需的字符串。