Question

我最近发现了Codility，我正在接受演示培训。我将此解决方案写入基因组范围查询问题，它运行正常，解决方案提供动态编程，但它的得分仅为87％而不是预期的100％。

任何人都有任何想法？

在这里您可以找到问题，它位于前缀部分。刚开始测试，看看问题描述！ Codility training

谢谢！

def solution(S, P, Q):
    # write your code in Python 2.6
    S = list(S)
    sol = [[0]*len(S),[0]*len(S),[0]*len(S),[0]*len(S)]

    mapping = {"A":1, "C":2, "G":3, "T":4}

    for i in range(0,len(S)):
        if S[i] == 'A':
            sol[0][i]+= 1

        elif S[i] == 'C':
            sol[1][i] += 1

        elif S[i] == 'G':
            sol[2][i] += 1

        elif S[i] == 'T':
            sol[3][i] += 1

        if i < len(S)-1:
            sol[0][i+1] = sol[0][i]
            sol[1][i+1] = sol[1][i]
            sol[2][i+1] = sol[2][i]
            sol[3][i+1] = sol[3][i]

    for n in range(0, len(P)):

            l = P[n]
            r = Q[n]
            pre_sum = [0,0,0,0]
            if l > 0:
                pre_sum = [sol[0][l],sol[1][l],sol[2][l],sol[3][l]]
            post_sum = [sol[0][r],sol[1][r],sol[2][r],sol[3][r]]
            if post_sum[0]-pre_sum[0] > 0:
                P[n] = 1
            elif post_sum[1]-pre_sum[1] > 0:
                P[n] = 2
            elif post_sum[2]-pre_sum[2] > 0:
                P[n] = 3
            elif post_sum[3]-pre_sum[3] > 0:
                P[n] = 4
            else:
                P[n] = mapping[S[P[n]]];

    return P


pass

Answer 1

啊，我正在做同样的事情，我花了很长时间来调试，但最后我设法获得了100/100。

例如，何时 S='AGT'和P=[1]，Q=[2]，该函数应该为G返回3，但是你的（最初我的）将为T

返回4

我认为这会解决它：

if l > 0: pre_sum = [sol[0][l-1],sol[1][l-1],sol[2][l-1],sol[3][l-1]]

Answer 2

如果仍然有人对此练习感兴趣，请分享我的Python解决方案（100/100 in Codility）

def solution(S, P, Q):

    count = []
    for i in range(3):
        count.append([0]*(len(S)+1))

    for index, i in enumerate(S):
        count[0][index+1] = count[0][index] + ( i =='A')
        count[1][index+1] = count[1][index] + ( i =='C')
        count[2][index+1] = count[2][index] + ( i =='G')

    result = []

    for i in range(len(P)):
      start = P[i]
      end = Q[i]+1

      if count[0][end] - count[0][start]:
          result.append(1)
      elif count[1][end] - count[1][start]:
          result.append(2)
      elif count[2][end] - count[2][start]:
          result.append(3)
      else:
          result.append(4)

    return result

Answer 3

100/100

def solution(S,P,Q):
    d = {"A":0,"C":1,"G":2,"T":3}
    n = len(S)
    pref = [[0,0,0,0]]*(n+1)
    for i in range(0,n):
        pref[i] = [x for x in pref[i-1]]
        pref[i][d[S[i]]] += 1
    lst = []
    for i in range(0,len(P)):
        if Q[i] == P[i]:
            lst.append(d[S[P[i]]]+1)
        else:
            x = 0
            while x < 4:
                if pref[Q[i]][x] - pref[P[i]-1][x] > 0:
                    lst.append(x+1)
                    break
                x += 1
    return lst

Answer 4

这同样适用于100/100

def solution(S, P, Q):
    res = []
    for i in range(len(P)):
        if 'A' in S[P[i]:Q[i]+1]:
            res.append(1)
        elif 'C' in S[P[i]:Q[i]+1]:
            res.append(2)
        elif 'G' in S[P[i]:Q[i]+1]:
            res.append(3)
        else:
            res.append(4)
    return res

Answer 5

对于Python3.6，为100％：

def solution(S, P, Q):

    NUCLEOTIDES = 'ACGT'
    IMPACTS = {nucleotide: impact for impact, nucleotide in enumerate(NUCLEOTIDES, 1)}

    result = []

    for query in range(len(P)):
        sample = S[P[query]:Q[query]+1]

        for nucleotide, impact in IMPACTS.items():
            if nucleotide in sample:
                result.append(impact)
                break

    return result

Answer 6

我发现GenomicRangeQuery的优异成绩得分为100％。

def solution(s,p,q):
    n = len(p)
    r = [0]*n

    for i in range(n):
        pi=p[i]
        qi=q[i]+1
        ts=s[pi:qi]
        if 'A' in ts:
            r[i]=1
        elif 'C' in ts:
            r[i]=2
        elif 'G' in ts:
            r[i]=3
        elif 'T' in ts:
            r[i]=4
    return r

s,p,q = 'CAGCCTA', [2, 5, 0], [4, 5, 6]
solution(s,p,q)

Answer 7

使用in或contains运算符的语言特定实现，无需任何技巧就可以得分100/100 O（N + M）算法：

Lets define prefix as:
 * last index of particular nucleone before on in current position. If no prev occcurance put -1.
 * 
 * 
 * indexes:     0   1   2   3   4   5   6
 * factors:     2   1   3   2   2   4   1
 *              C   A   G   C   C   T   A
 *              
 * prefix : A  -1   1   1   1   1   1   6
 *          C   0   0   0   3   4   4   4
 *          G  -1  -1   2   2   2   2   2
 *          T  -1  -1  -1  -1  -1   5   5
 *
 * Having such defined prefix let us easily calculate answer question of minimal factor in following way:
 * subsequence S[p]S[p+1]...S[q-1]S[q] has the lowest factor:
 * 1 if prefix index [A][q] >= p
 * 2 if prefix index [C][q] >= p
 * 3 if prefix index [G][q] >= p
 * 4 if prefix index [T][q] >= p

我的implementation of this idea

Answer 8

对于每种类型的核苷酸，我们可以计算从当前位置 (i=0,1,...,N-1) 到最近的前一个核苷酸的距离，其中所有以前的核苷酸和当前的核苷酸（在当前位置）被考虑。

距离数组 pre_dists 将类似于：

    |   C   A    G    C    C    T    A  |
----|-----------------------------------|
 A  |  -1   0    1    2    3    4    0  |
 C  |   0   1    2    0    0    1    2  |
 G  |  -1  -1    0    1    2    3    4  |
 T  |  -1  -1   -1   -1   -1    0    1  |

基于这个距离数据，我可以得到任何切片的最小影响因子。

我在 Python 中的实现：

def solution(S, P, Q):
    
    N = len(S)
    M = len(P)

    # impact factors
    I = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
    
    # distance from current position to the nearest nucleotide
    # for each nucleotide type (previous or current nucleotide are considered) 
    # e.g. current position is 'A' => the distance dist[0] = 0, index 0 for type A
    #                          'C' => the distance dist[1] = 0, index 1 for type C
    pre_dists = [[-1]*N,[-1]*N,[-1]*N,[-1]*N]

    # initial values
    pre_dists[I[S[0]]-1][0] = 0

    for i in range(1, N):
        
        for t in range(4):
            if pre_dists[t][i-1] >= 0:
                # increase the distances
                pre_dists[t][i] = pre_dists[t][i-1] + 1

        # reset distance for current nucleotide type
        pre_dists[I[S[i]]-1][i] = 0
    
    # result keeper
    res = [0]*M

    for k in range(M):
        p = P[k]
        q = Q[k]

        if pre_dists[0][q] >=0 and q - pre_dists[0][q] >= p:
            res[k] = 1
        elif pre_dists[1][q] >=0 and q - pre_dists[1][q] >= p:
            res[k] = 2
        elif pre_dists[2][q] >=0 and q - pre_dists[2][q] >= p:
            res[k] = 3
        else:
            res[k] = 4
    
    return res

我希望这有帮助。谢谢！

Codility基因组范围查询

8 个答案: