Question

您好我正在尝试编写一个输出文件，如果存在相同的值，则从第5列（alt）删除第4列（ref）值。

这是我的代码：

with open(two) as infile, open (three, 'w') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile, delimiter='\t')

    for g, pos, code, ref, alt, *rest in reader:
        a = alt.split(',')
        b = [x for x in a]
        if b == ref:
            writer.writerow([g, pos, code, ref, [alt-ref]] + rest)
        if b != ref:
            writer.writerow([g, pos, code, ref, alt] + rest)

我知道[alt-ref]不起作用。我不确定哪个功能可以替代这部分。我对第4和第5列的信号看起来像这样：

A   A,B,C
T   H,D,T
H   A,H,D,C

和我想要的输出：

A   B,C
T   H,D
H   A,D,C

有人能帮帮我吗？我很感激。

Answer 1

您可以使用set删除和过滤项目。检查这个快速示例来做到这一点。

注意：我们并未强调如何打开/写入新文件。

data= """A  A,B,C
T  H,D,T
H  A,H,D,C"""
newFile=""
for line in data.splitlines():  #Reading the sequence 
    ref,alt= line.split("  ") #splitting lines, to get ref/alt columns
    altList= alt.split(",")   #splitting alt to get items 
    l= list(set(altList)-set(ref)) #  delete from alt if the same value is present in ref.
    newLine= " ".join([ref,",".join(l)]) #rewriting the data
    newFile+=newLine+'\n'
    #print newLine

print newFile

输出：

A C,B
T H,D
H A,C,D

从不同列中删除值

1 个答案: