Codility基因组范围查询

时间:2014-04-03 11:41:48

标签: python algorithm dynamic-programming

我最近发现了Codility,我正在接受演示培训。 我将此解决方案写入基因组范围查询问题,它运行正常,解决方案提供动态编程,但它的得分仅为87%而不是预期的100%。

任何人都有任何想法?

在这里您可以找到问题,它位于前缀部分。刚开始测试,看看问题描述! Codility training

谢谢!

def solution(S, P, Q):
    # write your code in Python 2.6
    S = list(S)
    sol = [[0]*len(S),[0]*len(S),[0]*len(S),[0]*len(S)]

    mapping = {"A":1, "C":2, "G":3, "T":4}

    for i in range(0,len(S)):
        if S[i] == 'A':
            sol[0][i]+= 1

        elif S[i] == 'C':
            sol[1][i] += 1

        elif S[i] == 'G':
            sol[2][i] += 1

        elif S[i] == 'T':
            sol[3][i] += 1

        if i < len(S)-1:
            sol[0][i+1] = sol[0][i]
            sol[1][i+1] = sol[1][i]
            sol[2][i+1] = sol[2][i]
            sol[3][i+1] = sol[3][i]

    for n in range(0, len(P)):

            l = P[n]
            r = Q[n]
            pre_sum = [0,0,0,0]
            if l > 0:
                pre_sum = [sol[0][l],sol[1][l],sol[2][l],sol[3][l]]
            post_sum = [sol[0][r],sol[1][r],sol[2][r],sol[3][r]]
            if post_sum[0]-pre_sum[0] > 0:
                P[n] = 1
            elif post_sum[1]-pre_sum[1] > 0:
                P[n] = 2
            elif post_sum[2]-pre_sum[2] > 0:
                P[n] = 3
            elif post_sum[3]-pre_sum[3] > 0:
                P[n] = 4
            else:
                P[n] = mapping[S[P[n]]];

    return P


pass

8 个答案:

答案 0 :(得分:2)

啊,我正在做同样的事情,我花了很长时间来调试,但最后我设法获得了100/100。

例如,何时 S='AGT'P=[1]Q=[2], 该函数应该为G返回3,但是你的(最初我的)将为T

返回4

我认为这会解决它:

if l > 0: pre_sum = [sol[0][l-1],sol[1][l-1],sol[2][l-1],sol[3][l-1]]

答案 1 :(得分:0)

如果仍然有人对此练习感兴趣,请分享我的Python解决方案 (100/100 in Codility)

def solution(S, P, Q):

    count = []
    for i in range(3):
        count.append([0]*(len(S)+1))

    for index, i in enumerate(S):
        count[0][index+1] = count[0][index] + ( i =='A')
        count[1][index+1] = count[1][index] + ( i =='C')
        count[2][index+1] = count[2][index] + ( i =='G')

    result = []

    for i in range(len(P)):
      start = P[i]
      end = Q[i]+1

      if count[0][end] - count[0][start]:
          result.append(1)
      elif count[1][end] - count[1][start]:
          result.append(2)
      elif count[2][end] - count[2][start]:
          result.append(3)
      else:
          result.append(4)

    return result

答案 2 :(得分:0)

100/100

def solution(S,P,Q):
    d = {"A":0,"C":1,"G":2,"T":3}
    n = len(S)
    pref = [[0,0,0,0]]*(n+1)
    for i in range(0,n):
        pref[i] = [x for x in pref[i-1]]
        pref[i][d[S[i]]] += 1
    lst = []
    for i in range(0,len(P)):
        if Q[i] == P[i]:
            lst.append(d[S[P[i]]]+1)
        else:
            x = 0
            while x < 4:
                if pref[Q[i]][x] - pref[P[i]-1][x] > 0:
                    lst.append(x+1)
                    break
                x += 1
    return lst

答案 3 :(得分:0)

这同样适用于100/100

def solution(S, P, Q):
    res = []
    for i in range(len(P)):
        if 'A' in S[P[i]:Q[i]+1]:
            res.append(1)
        elif 'C' in S[P[i]:Q[i]+1]:
            res.append(2)
        elif 'G' in S[P[i]:Q[i]+1]:
            res.append(3)
        else:
            res.append(4)
    return res

答案 4 :(得分:0)

对于Python3.6,为100%:

def solution(S, P, Q):

    NUCLEOTIDES = 'ACGT'
    IMPACTS = {nucleotide: impact for impact, nucleotide in enumerate(NUCLEOTIDES, 1)}

    result = []

    for query in range(len(P)):
        sample = S[P[query]:Q[query]+1]

        for nucleotide, impact in IMPACTS.items():
            if nucleotide in sample:
                result.append(impact)
                break

    return result

答案 5 :(得分:0)

我发现GenomicRangeQuery的优异成绩得分为100%。

def solution(s,p,q):
    n = len(p)
    r = [0]*n

    for i in range(n):
        pi=p[i]
        qi=q[i]+1
        ts=s[pi:qi]
        if 'A' in ts:
            r[i]=1
        elif 'C' in ts:
            r[i]=2
        elif 'G' in ts:
            r[i]=3
        elif 'T' in ts:
            r[i]=4
    return r

s,p,q = 'CAGCCTA', [2, 5, 0], [4, 5, 6]
solution(s,p,q)

答案 6 :(得分:0)

使用incontains运算符的语言特定实现,无需任何技巧就可以得分100/100 O(N + M)算法:

Lets define prefix as:
 * last index of particular nucleone before on in current position. If no prev occcurance put -1.
 * 
 * 
 * indexes:     0   1   2   3   4   5   6
 * factors:     2   1   3   2   2   4   1
 *              C   A   G   C   C   T   A
 *              
 * prefix : A  -1   1   1   1   1   1   6
 *          C   0   0   0   3   4   4   4
 *          G  -1  -1   2   2   2   2   2
 *          T  -1  -1  -1  -1  -1   5   5
 *
 * Having such defined prefix let us easily calculate answer question of minimal factor in following way:
 * subsequence S[p]S[p+1]...S[q-1]S[q] has the lowest factor:
 * 1 if prefix index [A][q] >= p
 * 2 if prefix index [C][q] >= p
 * 3 if prefix index [G][q] >= p
 * 4 if prefix index [T][q] >= p

我的implementation of this idea

答案 7 :(得分:0)

对于每种类型的核苷酸,我们可以计算从当前位置 (i=0,1,...,N-1) 到最近的前一个核苷酸的距离,其中所有以前的核苷酸和当前的核苷酸(在当前位置)被考虑。

距离数组 pre_dists 将类似于:

    |   C   A    G    C    C    T    A  |
----|-----------------------------------|
 A  |  -1   0    1    2    3    4    0  |
 C  |   0   1    2    0    0    1    2  |
 G  |  -1  -1    0    1    2    3    4  |
 T  |  -1  -1   -1   -1   -1    0    1  |

基于这个距离数据,我可以得到任何切片的最小影响因子。

我在 Python 中的实现:

def solution(S, P, Q):
    
    N = len(S)
    M = len(P)

    # impact factors
    I = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
    
    # distance from current position to the nearest nucleotide
    # for each nucleotide type (previous or current nucleotide are considered) 
    # e.g. current position is 'A' => the distance dist[0] = 0, index 0 for type A
    #                          'C' => the distance dist[1] = 0, index 1 for type C
    pre_dists = [[-1]*N,[-1]*N,[-1]*N,[-1]*N]

    # initial values
    pre_dists[I[S[0]]-1][0] = 0

    for i in range(1, N):
        
        for t in range(4):
            if pre_dists[t][i-1] >= 0:
                # increase the distances
                pre_dists[t][i] = pre_dists[t][i-1] + 1

        # reset distance for current nucleotide type
        pre_dists[I[S[i]]-1][i] = 0
    
    # result keeper
    res = [0]*M

    for k in range(M):
        p = P[k]
        q = Q[k]

        if pre_dists[0][q] >=0 and q - pre_dists[0][q] >= p:
            res[k] = 1
        elif pre_dists[1][q] >=0 and q - pre_dists[1][q] >= p:
            res[k] = 2
        elif pre_dists[2][q] >=0 and q - pre_dists[2][q] >= p:
            res[k] = 3
        else:
            res[k] = 4
    
    return res

我希望这有帮助。谢谢!