我找到了一个有趣的"为f#在线投影,其背后的想法是找到给定字符串中的子串数。
这是提示:
Description:
You are given a DNA sequence:
a string that contains only characters 'A', 'C', 'G', and 'T'.
Your task is to calculate the number of substrings of sequence,
in which each of the symbols appears the same number of times.
Example 1:
For sequence = "ACGTACGT", the output should be 6
All substrings of length 4 contain each symbol exactly once (+5),
and the whole sequence contains each symbol twice (+1).
Example 2:
For sequence = "AAACCGGTTT", the output should be 1
Only substring "AACCGGTT" satisfies the criterion above: it contains each symbol twice.
Input: String, a sequence that consists only of symbols 'A', 'C', 'G', and 'T'.
Length constraint: 0 < sequence.length < 100000.
Output: Integer, the number of substrings where each symbol appears equally many times.
我不确定该去哪里,或者更具体地说该怎么做。我在互联网上环顾四周试图找到我应该做的事情而且我只找到了以下代码(我添加了输入变量,var变量,并更改了显示&# 34;事情&#34;到输入然后搜索的子字符串(我希望有意义)):
open System
let countSubstring (where :string) (what : string) =
match what with
| "" -> 0
| _ -> (where.Length - where.Replace(what, @"").Length) / what.Length
[<EntryPoint>]
let main argv =
let input = System.Console.ReadLine();
let var = input.Length;
Console.WriteLine(var);
let show where what =
printfn @"countSubstring(""%s"", ""%s"") = %d" where what (countSubstring where what)
show input "ACGT"
show input "CGTA"
show input "GTAC"
show input "TACG"
0
无论如何,如果有人能帮助我,我们将不胜感激。
提前致谢
答案 0 :(得分:2)
这是一个解决方案,它生成所有可以被4整除的长度的子串,然后计算其中有多少个具有相同数量的符号。请注意,如果子字符串的长度不能被4整除,则它不能具有相同数量的四个不同符号。
let hasEqualAmountOfSymbols (substring : string) =
let symbolAppearances =
['A'; 'C'; 'G'; 'T']
|> List.map (fun symbol ->
substring
|> Seq.filter ((=) symbol)
|> Seq.length)
symbolAppearances
|> List.pairwise
|> List.forall (fun (x, y) -> x = y)
let countSubstrings input =
let potentialSubstrings =
let lastIndex = String.length input - 1
[ for i in 0 .. lastIndex do
for j in i + 3 .. 4 .. lastIndex do
yield input.Substring(i, j - i + 1) ]
potentialSubstrings
|> List.filter hasEqualAmountOfSymbols
|> List.length
countSubstrings "ACGTACGT" // -> 6
countSubstrings "AAACCGGTTT" // -> 1
答案 1 :(得分:2)
首先声明一个函数numberACGT
,如果字符数A与C,G和T相同,则字符串返回1,否则返回0。为此,声明一个4个整数的数组N初始化为0并运行抛出字符串,递增相应的计数器。在它们之间比较数组元素。
然后对于每个子字符串(4的固定长度倍数)调用numberACGT
并添加到计数器count
(在开头初始化为0)
let numberACGT (aString:string) =
let N = Array.create 4 (0:int)
let last = aString.Length - 1
for i = 0 to last do
match aString.[i] with
| 'A' -> N.[0] <- N.[0] + 1
| 'C' -> N.[1] <- N.[1] + 1
| 'G' -> N.[2] <- N.[2] + 1
| _ -> N.[3] <- N.[3] + 1
if (N.[0] = N.[1]) && (N.[1] = N.[2]) && (N.[2] = N.[3]) then 1 else 0
let numberSubStrings (aString:string) =
let mutable count = 0
let len = aString.Length
for k = 1 to len / 4 do //only multiple of 4
for pos = 0 to len - 4*k do
count <- count + numberACGT (aString.[pos..pos+4*k-1])
count
我希望它足够快。
[<EntryPoint>]
let main argv =
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
let input = Console.ReadLine() in
printf "%i " (numberSubStrings input)
stopWatch.Stop()
let g = Console.ReadLine()
0
结果:
62 4.542700
O(n²)中的新版本:
let numberSubStringsBis (aString:string) =
let mutable count = 0
let len = aString.Length
for pos = 0 to len - 1 do
let mutable a = 0
let mutable c = 0
let mutable g = 0
let mutable t = 0
let mutable k = pos
while k + 3 <= len - 1 do
for i in [k..k+3] do
match aString.[i] with
| 'A' -> a <- a + 1
| 'C' -> c <- c + 1
| 'G' -> g <- g + 1
| _ -> t <- t + 1
k <- k + 4
if a=c && c=g && g=t then count <- count + 1
count