所有子串都是使用函数式编程的字符序列

时间:2012-12-01 13:03:25

标签: functional-programming sequences

作为查找字符串中相同字符运行的my earlier question的后续内容,我还想找到一个函数算法来查找长度大于2的所有子字符串,这些子字符串是字母的升序或降序序列或字符串([Char])中的数字(例如,“defgh”,“34567”,“XYZ”,“fedcba”,“NMLK”,9876“等)。我唯一的序列我正在考虑的是A..Za..z0..9及其降序对应的子串。返回值应该是(从零开始的偏移,长度)对的列表。我正在翻译从JavaScript(包含命令式代码)到Scala的“zxcvbn”密码强度算法。我希望保持我的代码尽可能纯粹的功能,因为在函数式编程风格中编写的所有常见原因。

我的代码是用Scala编写的,但我可以在Clojure,F#,Haskell或伪代码中翻译算法。

示例:对于字符串qweABCD13987,将返回[(3,4),(9,3)]

我写了一个相当怪异的功能,当我再次访问我的工作计算机时,我会发布,但我确信存在更优雅的解决方案。

再一次,谢谢。

3 个答案:

答案 0 :(得分:0)

我想这个问题的一个很好的解决方案实际上比起初看起来更复杂。 我不是Scala Pro,所以我的解决方案肯定不是最优和最好的,但也许它会给你一些想法。

基本思想是计算两个连续字符之间的差异,之后不幸的是它变得有点混乱。问我一些代码是否不清楚!

object Sequences {

  val s = "qweABCD13987"                         

  val pairs = (s zip s.tail) toList   // if s might be empty, add a check here            
  // = List((q,w), (w,e), (e,A), (A,B), (B,C), (C,D), (D,1), (1,3), (3,9), (9,8), (8,7))

  // assuming all characters are either letters or digits
  val diff = pairs map {case (t1, t2) =>
    if (t1.isLetter ^ t2.isLetter) 0 else t1 - t2}   // xor could also be replaced by !=
  // = List(-6, 18, 36, -1, -1, -1, 19, -2, -6, 1, 1)

  /**
   *
   * @param xs A list indicating the differences between consecutive characters
   * @param current triple: (start index of the current sequence;
   *                         number of current elements in the sequence;
   *                         number indicating the direction i.e. -1 = downwards, 1 = upwards, 0 = doesn't matter)
   * @return A list of triples similar to the argument
   */
  def sequences(xs: Seq[Int], current: (Int, Int, Int) = (0, 1, 0)): List[(Int, Int, Int)] = xs match {
    case Nil => current :: Nil
    case (1 :: ys) =>
      if (current._3 != -1)
        sequences(ys, (current._1, current._2 + 1, 1))
      else
        current :: sequences(ys, (current._1 + current._2 - 1, 2, 1))  // "recompute" the current index
    case (-1 :: ys) =>
      if (current._3 != 1)
        sequences(ys, (current._1, current._2 + 1, -1))
      else
        current :: sequences(ys, (current._1 + current._2 - 1, 2, -1))
    case (_ :: ys) =>
      current :: sequences(ys, (current._1 + current._2, 1, 0))
  }                                               

  sequences(diff) filter (_._2 > 1) map (t => (t._1, t._2))
}

答案 1 :(得分:0)

将问题分成几个较小的子问题总是最好的。我在Haskell中编写了一个解决方案,对我来说更容易。它使用惰性列表,但我想你可以使用流或者通过使主函数tail递归并将中间结果作为参数传递来将其转换为Scala。

-- Mark all subsequences whose adjacent elements satisfy
-- the given predicate. Includes subsequences of length 1.
sequences :: (Eq a) => (a -> a -> Bool) -> [a] -> [(Int,Int)]
sequences p [] = []
sequences p (x:xs) = seq x xs 0 0
  where
    -- arguments: previous char, current tail sequence, 
    -- last asc. start offset of a valid subsequence, current offset
    seq _ [] lastOffs curOffs = [(lastOffs, curOffs - lastOffs)]
    seq x (x':xs) lastOffs curOffs
        | p x x'    -- predicate matches - we're extending current subsequence
            = seq x' xs lastOffs curOffs'
        | otherwise -- output the currently marked subsequence and start a new one
            = (lastOffs, curOffs - lastOffs) : seq x' xs curOffs curOffs'
      where
        curOffs' = curOffs + 1

-- Marks ascending subsequences.
asc :: (Enum a, Eq a) => [a] -> [(Int,Int)]
asc = sequences (\x y -> succ x == y)

-- Marks descending subsequences.
desc :: (Enum a, Eq a) => [a] -> [(Int,Int)]
desc = sequences (\x y -> pred x == y)

-- Returns True for subsequences of length at least 2.
validRange :: (Int, Int) -> Bool
validRange (offs, len) = len >= 2

-- Find all both ascending and descending subsequences of the
-- proper length.
combined :: (Enum a, Eq a) => [a] -> [(Int,Int)]
combined xs = filter validRange (asc xs) ++ filter validRange (desc xs)

-- test:
main = print $ combined "qweABCD13987"

答案 2 :(得分:0)

这是我在Clojure中的近似值:

我们可以转换输入字符串,以便我们可以应用您的previous algorithm来找到解决方案。算法不是最高效的,但我认为你会有更抽象和可读的代码。

可以通过以下方式转换示例字符串:

user => (find-serials "qweABCD13987")
(0 1 2 # # # # 7 8 # # #)

重用previous function "find-runs"

user => (find-runs (find-serials "qweABCD13987"))
([3 4] [9 3])

最终代码如下所示:

(defn find-runs [s]
  (let [ls (map count (partition-by identity s))]
    (filter #(>= (% 1) 3) 
            (map vector (reductions + 0 ls) ls))))

(def pad "#")

(defn inc-or-dec? [a b] 
  (= (Math/abs (- (int a) (int b))) 1 ))

(defn serial? [a b c] 
  (or (inc-or-dec? a b) (inc-or-dec? b c)))

(defn find-serials [s] 
  (map-indexed (fn [x [a b c]] (if (serial? a b c) pad x)) 
       (partition 3 1 (concat pad s pad))))

find-serials创建一个3个单元格的滑动窗口并应用serial?来检测作为序列开始/中间/结尾的单元格。方便地填充字符串,因此窗口始终以原始字符为中心。