我从文本文件中输入以下内容:
Title Value Position Perturbation 1.5 0.6 8.5 9.8 0 8.5 9.6 0.5 0.6 (...)
Title Value Position Perturbation 3 1.5 6 0 0.8 9.7 5.3 9.9 0.7 0.9 (...)
我想删除前4列,对于带有数字的列,我想要每4个值进行子集化,并更改第3个值的第3个值的位置并删除第4个,因此,输出应如下所示:
1.5 8.5 0.6 0 9.6 8.5 0.6 (...)
3 6 1.5 0.8 5.3 9.7 0.7 (...)
对于这个提议,我编写了以下Python代码:
import sys
input_file= open (sys.argv[1],'r')
output_file= open (sys.argv[2], 'w')
with open(sys.argv[1]) as input_file:
for i, line in enumerate(input_file):
output_file.write ('\n')
marker_info= line.split()
#snp= marker_info[0]
end= len(marker_info)
x=4
y=8
# while y<=len(marker_info):
while x<=end:
intensities= marker_info[x:y]
AA= intensities[0]
BB= intensities[1]
AB= intensities[2]
NN= intensities[3]
output_file.write ('%s' '\t' '%s' '\t' '%s' '\t' % (AA, AB, BB))
x= y
y= x + 4
input_file.close()
output_file.close()
代码似乎工作正常,但问题是每行都缺少最后四个值。所以,我猜这个问题出现在&#34;而#34;声明......但我不知道如何解决它(我知道这似乎是一个简单的问题)。
提前感谢任何建议。
答案 0 :(得分:2)
试试这个:
1。像csv一样打开文件并剥离标签
2。生成所需大小的子列表
3。进行交换并删除尾随元素
4。保存输出(我已经用列表完成了它,但你可以用输出文件来完成)
>>> import csv
>>> output = []
>>> with open('sample.csv') as input:
... reader = csv.reader(input, delimiter=' ')
... for line in reader:
... line = line[4:] #strip labels
... slice_size = 4
... for slice_idx in range(0,len(line),slice_size):
... sublist = line[slice_idx : slice_idx+slice_size]
... if len(sublist) == slice_size:
... swap = sublist[2]
... sublist[2] = sublist[1]
... sublist[1] = swap
... output.append(sublist[:slice_size-1])
...
>>>
>>> output
[['1.5', '8.5', '0.6'], ['0', '9.6', '8.5'], ['3', '6', '1.5'], ['0.8', '5.3', '9.7']]
答案 1 :(得分:0)
尝试这个,它全部基于你的脚本,除了while表达式和打开文件方法。 输入文件:
Title Value Position Perturbation 1.5 0.6 8.5 9.8 0 8.5 9.6 0.5 0.6 1.1 2.2 3.3
Title Value Position Perturbation 3 1.5 6 0 0.8 9.7 5.3 9.9 0.7 0.9 1.1 2.2
Title Value Position Perturbation 3.1 2.5 1.6 0 1.8 2.7 4.3 6.9 3.7 1.9 2.1 3.2
脚本:
with open("parser.txt", "r") as input_file, open("output_parser.txt","w") as output_file:
for i, line in enumerate(input_file):
output_file.write ('\n')
marker_info= line.split()
end= len(marker_info)
x=4
y=8
while y<=end: #x<=end:
intensities= marker_info[x:y]
AA= intensities[0]
BB= intensities[1]
AB= intensities[2]
NN= intensities[3]
output_file.write ('%s' '\t' '%s' '\t' '%s' '\t' % (AA, AB, BB))
print end, x, y, marker_info[x:y], AA, AB, BB
x= y
y= x + 4
输出:
1.5 8.5 0.6 0 9.6 8.5 0.6 2.2 1.1
3 6 1.5 0.8 5.3 9.7 0.7 1.1 0.9
3.1 1.6 2.5 1.8 4.3 2.7 3.7 2.1 1.9