作为查找字符串中相同字符运行的my earlier question的后续内容,我还想找到一个函数算法来查找长度大于2的所有子字符串,这些子字符串是字母的升序或降序序列或字符串([Char]
)中的数字(例如,“defgh”,“34567”,“XYZ”,“fedcba”,“NMLK”,9876“等)。我唯一的序列我正在考虑的是A..Z
,a..z
,0..9
及其降序对应的子串。返回值应该是(从零开始的偏移,长度)对的列表。我正在翻译从JavaScript(包含命令式代码)到Scala的“zxcvbn”密码强度算法。我希望保持我的代码尽可能纯粹的功能,因为在函数式编程风格中编写的所有常见原因。
我的代码是用Scala编写的,但我可以在Clojure,F#,Haskell或伪代码中翻译算法。
示例:对于字符串qweABCD13987
,将返回[(3,4),(9,3)]
。
我写了一个相当怪异的功能,当我再次访问我的工作计算机时,我会发布,但我确信存在更优雅的解决方案。
再一次,谢谢。
答案 0 :(得分:0)
我想这个问题的一个很好的解决方案实际上比起初看起来更复杂。 我不是Scala Pro,所以我的解决方案肯定不是最优和最好的,但也许它会给你一些想法。
基本思想是计算两个连续字符之间的差异,之后不幸的是它变得有点混乱。问我一些代码是否不清楚!
object Sequences {
val s = "qweABCD13987"
val pairs = (s zip s.tail) toList // if s might be empty, add a check here
// = List((q,w), (w,e), (e,A), (A,B), (B,C), (C,D), (D,1), (1,3), (3,9), (9,8), (8,7))
// assuming all characters are either letters or digits
val diff = pairs map {case (t1, t2) =>
if (t1.isLetter ^ t2.isLetter) 0 else t1 - t2} // xor could also be replaced by !=
// = List(-6, 18, 36, -1, -1, -1, 19, -2, -6, 1, 1)
/**
*
* @param xs A list indicating the differences between consecutive characters
* @param current triple: (start index of the current sequence;
* number of current elements in the sequence;
* number indicating the direction i.e. -1 = downwards, 1 = upwards, 0 = doesn't matter)
* @return A list of triples similar to the argument
*/
def sequences(xs: Seq[Int], current: (Int, Int, Int) = (0, 1, 0)): List[(Int, Int, Int)] = xs match {
case Nil => current :: Nil
case (1 :: ys) =>
if (current._3 != -1)
sequences(ys, (current._1, current._2 + 1, 1))
else
current :: sequences(ys, (current._1 + current._2 - 1, 2, 1)) // "recompute" the current index
case (-1 :: ys) =>
if (current._3 != 1)
sequences(ys, (current._1, current._2 + 1, -1))
else
current :: sequences(ys, (current._1 + current._2 - 1, 2, -1))
case (_ :: ys) =>
current :: sequences(ys, (current._1 + current._2, 1, 0))
}
sequences(diff) filter (_._2 > 1) map (t => (t._1, t._2))
}
答案 1 :(得分:0)
将问题分成几个较小的子问题总是最好的。我在Haskell中编写了一个解决方案,对我来说更容易。它使用惰性列表,但我想你可以使用流或者通过使主函数tail递归并将中间结果作为参数传递来将其转换为Scala。
-- Mark all subsequences whose adjacent elements satisfy
-- the given predicate. Includes subsequences of length 1.
sequences :: (Eq a) => (a -> a -> Bool) -> [a] -> [(Int,Int)]
sequences p [] = []
sequences p (x:xs) = seq x xs 0 0
where
-- arguments: previous char, current tail sequence,
-- last asc. start offset of a valid subsequence, current offset
seq _ [] lastOffs curOffs = [(lastOffs, curOffs - lastOffs)]
seq x (x':xs) lastOffs curOffs
| p x x' -- predicate matches - we're extending current subsequence
= seq x' xs lastOffs curOffs'
| otherwise -- output the currently marked subsequence and start a new one
= (lastOffs, curOffs - lastOffs) : seq x' xs curOffs curOffs'
where
curOffs' = curOffs + 1
-- Marks ascending subsequences.
asc :: (Enum a, Eq a) => [a] -> [(Int,Int)]
asc = sequences (\x y -> succ x == y)
-- Marks descending subsequences.
desc :: (Enum a, Eq a) => [a] -> [(Int,Int)]
desc = sequences (\x y -> pred x == y)
-- Returns True for subsequences of length at least 2.
validRange :: (Int, Int) -> Bool
validRange (offs, len) = len >= 2
-- Find all both ascending and descending subsequences of the
-- proper length.
combined :: (Enum a, Eq a) => [a] -> [(Int,Int)]
combined xs = filter validRange (asc xs) ++ filter validRange (desc xs)
-- test:
main = print $ combined "qweABCD13987"
答案 2 :(得分:0)
这是我在Clojure中的近似值:
我们可以转换输入字符串,以便我们可以应用您的previous algorithm来找到解决方案。算法不是最高效的,但我认为你会有更抽象和可读的代码。
可以通过以下方式转换示例字符串:
user => (find-serials "qweABCD13987")
(0 1 2 # # # # 7 8 # # #)
重用previous function "find-runs":
user => (find-runs (find-serials "qweABCD13987"))
([3 4] [9 3])
最终代码如下所示:
(defn find-runs [s]
(let [ls (map count (partition-by identity s))]
(filter #(>= (% 1) 3)
(map vector (reductions + 0 ls) ls))))
(def pad "#")
(defn inc-or-dec? [a b]
(= (Math/abs (- (int a) (int b))) 1 ))
(defn serial? [a b c]
(or (inc-or-dec? a b) (inc-or-dec? b c)))
(defn find-serials [s]
(map-indexed (fn [x [a b c]] (if (serial? a b c) pad x))
(partition 3 1 (concat pad s pad))))
find-serials
创建一个3个单元格的滑动窗口并应用serial?
来检测作为序列开始/中间/结尾的单元格。方便地填充字符串,因此窗口始终以原始字符为中心。