Question

我正在尝试解析基因型数据，主要是为了将其转换为其他软件使用，对不起，如果问题太具体，但我们非常感谢任何意见和建议。

ID, exp, control
1, aa, bb
2, ab, aa
3, ab, -

我会像这样改变：

重复每一列但第一列。
将'aa'，'bb'替换为'a'，'b'，当'ab'时，第一个将被替换为'a'，复制品将替换为'b'。

例如

    ID exp exp control control
    1 a a b b
    2 a b a a
    3 a b 0 0

我以某种方式设法实现了第一个目标，但我发现打印输出有点奇怪，所有替换都没有执行：

ID exp   exp     control
     control

1 aa     aa  bb
     bb

2 ab     ab  aa
     aa

3 ab     ab  -
     -

这是我的代码：

#!/usr/bin/env python

inputfile = open("test.txt", 'r')
outputfile = open("solomon.txt", 'w')
matchlines = inputfile.readlines()

for line in matchlines: 
        line_parts = line.strip() #strip the end space
        line_parts = line.split(',') #split the line
        output_parts = []
        for part in line_parts[1:]:  #start from 2nd element, so 1st column not duplicate

            if part == 'aa':
               part = part.replace('aa', 'a')
            elif part == 'bb':
               part = part.replace('bb', 'b')
            elif part == '-':
               part = part.replace('-', '0')
            elif part == 'ab':
                 '''the original one will be replaced with 'a' the duplciatd on will be replaced as 'b' '''
            else:
                 print 'Nothing is matched'
            output_part = part + '\t' + part #duplicate each element (1st goal)             
            output_parts.append(output_part) #populate the line      
            line = '\t'.join(output_parts)   #join elements in the line with a tab                
        outputfile.write(line_parts[0] + line + "\n")

inputfile.close()
outputfile.close()

Answer 1

我会为此建议一个单独的功能，使其更容易与其他元素分开开发和测试。

def process_line(line_parts):
    out = line_parts[:1]
    for part in line_parts[1:]:
        if part == "-":
            out.extend('00')
        else:
            out.extend(part)
    return out

这给出了，例如

>>> process_line(['1', 'aa', '-'])
['1', 'a', 'a', '0', '0']

>>> process_line(['1', 'ab', 'bb'])
['1', 'a', 'b', 'b', 'b']

您可以使用str.join：

轻松地将其设置为以空格分隔的字符串

>>> " ".join(['1', 'a', 'a', '0', '0'])
'1 a a 0 0'

替换元素以重新格式化数据集

1 个答案: