您好我正在尝试编写一个输出文件,如果存在相同的值,则从第5列(alt)删除第4列(ref)值。
这是我的代码:
with open(two) as infile, open (three, 'w') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
for g, pos, code, ref, alt, *rest in reader:
a = alt.split(',')
b = [x for x in a]
if b == ref:
writer.writerow([g, pos, code, ref, [alt-ref]] + rest)
if b != ref:
writer.writerow([g, pos, code, ref, alt] + rest)
我知道[alt-ref]不起作用。我不确定哪个功能可以替代这部分。 我对第4和第5列的信号看起来像这样:
A A,B,C
T H,D,T
H A,H,D,C
和我想要的输出:
A B,C
T H,D
H A,D,C
有人能帮帮我吗?我很感激。
答案 0 :(得分:0)
您可以使用set
删除和过滤项目。检查这个快速示例来做到这一点。
注意:我们并未强调如何打开/写入新文件。
data= """A A,B,C
T H,D,T
H A,H,D,C"""
newFile=""
for line in data.splitlines(): #Reading the sequence
ref,alt= line.split(" ") #splitting lines, to get ref/alt columns
altList= alt.split(",") #splitting alt to get items
l= list(set(altList)-set(ref)) # delete from alt if the same value is present in ref.
newLine= " ".join([ref,",".join(l)]) #rewriting the data
newFile+=newLine+'\n'
#print newLine
print newFile
输出:
A C,B
T H,D
H A,C,D