我正在尝试从文本文件收集数据。当我打印输出时,它们返回我要寻找的正确值,但是,当我尝试使用xlsxwriter将这些输出放到表中时,该表仅包含txt文件最后一行的输出,重复量文本文件中有行的次数。 即有5000行文本,我需要3条信息,.xlsx文件具有5000行和3列,但都包含文本文件中最后一行的信息。
EC:1> GO:N-乙基马来酰亚胺还原酶活性; GO:0008748
EC:1> GO:氧化还原酶活性; GO:0016491
EC:1> GO:辅酶F420脱氢酶活性降低; GO:0043738
EC:1> GO:硫氧化酶还原酶活性; GO:0043826
EC:1> GO:苹果乳酸酶活性; GO:0043883
^ txt文件的外观
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
6.6.1.2钴螯合酶活性0051116
... ... ...
(表格的外观,但可显示5000行)
任何帮助将不胜感激, 问候
import xlsxwriter
File = 'EC_to_GO.txt'
def analysis(line, output):
with open(File) as fp:
lines = fp.readlines()
for line in lines:
output[0] = line[3:].split(' > ')[0]
output[1] = line[:-14].split(' > GO:')[-1]
output[2] = line[-8:]
return output
with open(File) as fp:
lines = fp.readlines()
for line in lines:
if 'Generated on 2018-07-04T09:08Z' in line:
a = lines.index(line)
for line in lines:
if 'GO:cobaltochelatase activity ; GO:0051116' in line:
b = lines.index(line)
req_list = lines[a:b]
rxn_end_index = []
for i in range(len(req_list)):
if '> GO:' in req_list[i]:
rxn_end_index.append(i)
inner_list = []
outer_list =[]
spare = [0] + rxn_end_index
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(inner_list)
res_list=[]
for i in range(len(outer_list)):
res_list.append(analysis(outer_list[i],['NA','NA','NA']))
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('EC_to_GO.xlsx')
worksheet = workbook.add_worksheet('EC_to_GO')
#res_list1 = [EC, Genome name, GO]
#for i in res_list:
#res_list1.append(i)
# Some data we want to write to the worksheet.
t = tuple(res_list)
# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for a,b,c in (t):
worksheet.write(row, col, a)
worksheet.write(row, col + 1, b)
worksheet.write(row, col + 2, c)
row += 1
workbook.close()
答案 0 :(得分:1)
您基本上是将相同的列表附加到res_list
。因此,您拥有同一output
列表的多个副本。
要解决: 代替
res_list.append(analysis(outer_list[i],['NA','NA','NA']))
#And in the previous loop
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(inner_list)
将其更改为:
res_list.append(analysis(outer_list[i],['NA','NA','NA'])[:])
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(inner_list[:])
或
res_list.append(copy(analysis(outer_list[i],['NA','NA','NA'])))
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(copy(inner_list))
符号列表[:]创建列表的副本。从技术上讲,您正在创建整个列表的一部分。