Question

我有一个包含A，G，C和T（长度为n）的字符串。如果字符串包含相同数量的A，G，C和T（每个n / 4次），则该字符串是稳定的。我需要找到替换后子串的最小长度使其稳定。 https://www.hackerrank.com/challenges/bear-and-steady-gene

假设s1 =＆＃39; AAGAAGAA＆＃39; 现在，因为n = 8，理想情况下它应该具有2个A，2个T，2个G和2个C＆＃39。它有超过4的A，因此我们需要一个包含至少4个A的子串。我首先从左边取4个字符子串，如果没有找到，那么我将mnum（变量）增加1（即查找5个变量子串，依此类推）我们得到AAGAA作为答案。 但它太慢了。

 from collections import Counter
 import sys
 n=int(input())       #length of string
 s1=input()
 s=Counter(s1)
 le=int(n/4)          #ideal length of each element
 comp={'A':le,'G':le,'C':le,'T':le}    #dictionary containing equal number of all elements
 s.subtract(comp)     #Finding by how much each element ('A','G'...) is in excess or loss
 a=[]
 b=[]
 for x in s.values():   #storing frequency(s.values--[4,2]) of elements which are in excess
    if(x>0):
      a.append(x)
 for x in s.keys():         #storing corresponding elements(s.keys--['A','G'])
    if(s[x]>0):
       b.append(x)
 mnum=sum(a)            #minimum substring length to start with
 if(mnum==0):
   print(0)
   sys.exit
 flag=0
 while(mnum<=n):  #(when length 4 substring with all the A's and G's is not found increasing to 5 and so on)
    for i in range(n-mnum+1):     #Finding substrings with length mnum in s1
       for j in range(len(a)):    #Checking if all of excess elements are present
           if(s1[i:i+mnum].count(b[j])==a[j]):
              flag=1
           else:
              flag=0

       if(flag==1):
          print(mnum)
          sys.exit()
    mnum+=1

Answer 1

可以在O(N)时间和O(N)空间中找到最小子字符串。

首先从长度为fr[i]的输入中计算每个字符的频率n。现在，最重要的要认识到的是，子串被视为最小的必要和充分条件，它必须包含每个频率至少为fr[i] - n/4的多余字符。否则，将无法替换丢失的字符。因此，我们的任务是遍历每个这样的子字符串，然后选择长度最小的子字符串。

但是如何有效地找到它们呢？

开始时，minLength是n。我们引入了2指针索引-left和right（最初是0），它们在原始字符串{中定义了从left到right的子字符串{1}}。然后，我们递增str直到right中每个多余字符的频率至少为str[left:right]。但这还不是全部，因为fr[i] - n/4可能在左侧包含不必要的字符（例如，它们并不多余，因此可以删除）。因此，只要str[left : right]仍包含足够的多余元素，我们就递增left。完成后，如果str[left : right]大于minLength，我们将对其进行更新。我们重复该过程，直到right - left。

让我们考虑一个例子。令right >= n为输入字符串。然后，算法步骤如下：

1。计算每个字符的频率：

GAAAAAAA

2。现在遍历原始字符串：

['G'] = 1, ['A'] = 6, ['T'] = 0, ['C'] = 0 ('A' is excessive here)

或下面的完整代码：

Step#1: |G|AAAAAAA
    substr = 'G' - no excessive chars (left = 0, right = 0) 
Step#2: |GA|AAAAAA
    substr = 'GA' - 1 excessive char, we need 5 (left = 0, right = 1)
Step#3: |GAA|AAAAA
    substr = 'GAA' - 2 excessive chars, we need 5 (left = 0, right = 2)
Step#4: |GAAA|AAAA
    substr = 'GAAA' - 3 excessive chars, we need 5 (left = 0, right = 3)
Step#5: |GAAAA|AAA
    substr = 'GAAAA' - 4 excessive chars, we need 5 (left = 0, right = 4)
Step#6: |GAAAAA|AA
    substr = 'GAAAAA' - 5 excessive chars, nice but can we remove something from left? 'G' is not excessive anyways. (left = 0, right = 5)
Step#7: G|AAAAA|AA
    substr = 'AAAAA' - 5 excessive chars, wow, it's smaller now. minLength = 5 (left = 1, right = 5)   
Step#8: G|AAAAAA|A
    substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 1, right = 6)
Step#9: GA|AAAAA|A
    substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 2, right = 6)
Step#10: GA|AAAAAA|
    substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 2, right = 7)
Step#11: GAA|AAAAA|
    substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 3, right = 7)
Step#12: That's it as right >= 8

Answer 2

这是一个完成测试有限的解决方案。这应该会为您提供有关如何改进代码的一些想法。

  vector<vector<point> > a; // 2D array                                         
  a.resize(100);
  for_each(a.begin(),a.end(),[](vector<point>& v){v.resize(200);});
  point p(2,3);
  a[0][0] = p; // ok now

熊和稳定的基因 - 代码优化（python）

2 个答案: