从不同列中删除值

时间:2014-11-07 14:07:32

标签: python python-3.x

您好我正在尝试编写一个输出文件,如果存在相同的值,则从第5列(alt)删除第4列(ref)值。

这是我的代码:

with open(two) as infile, open (three, 'w') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile, delimiter='\t')

    for g, pos, code, ref, alt, *rest in reader:
        a = alt.split(',')
        b = [x for x in a]
        if b == ref:
            writer.writerow([g, pos, code, ref, [alt-ref]] + rest)
        if b != ref:
            writer.writerow([g, pos, code, ref, alt] + rest)

我知道[alt-ref]不起作用。我不确定哪个功能可以替代这部分。 我对第4和第5列的信号看起来像这样:

A   A,B,C
T   H,D,T
H   A,H,D,C

和我想要的输出:

A   B,C
T   H,D
H   A,D,C
有人能帮帮我吗?我很感激。

1 个答案:

答案 0 :(得分:0)

您可以使用set删除和过滤项目。检查这个快速示例来做到这一点。

注意:我们并未强调如何打开/写入新文件。

data= """A  A,B,C
T  H,D,T
H  A,H,D,C"""
newFile=""
for line in data.splitlines():  #Reading the sequence 
    ref,alt= line.split("  ") #splitting lines, to get ref/alt columns
    altList= alt.split(",")   #splitting alt to get items 
    l= list(set(altList)-set(ref)) #  delete from alt if the same value is present in ref.
    newLine= " ".join([ref,",".join(l)]) #rewriting the data
    newFile+=newLine+'\n'
    #print newLine

print newFile

输出:

A C,B
T H,D
H A,C,D