我有一个用于fortran的文本文件,文本文件的行限制限制为80个字符。我试图在python中解析它。
有不同项目的特征表,在这些表中有表格表。不同的站点可能包含以下所有数据或不包含任何数据。
1PROGRAM MYPRJ PROJECT CHARACTERISTIC DATA PROJECT
11/22/00 MASTER FILE NO NAME REV
1 PROJ
DATA1 7000.0 uni
DATA2 -5000.0 uni
DATA3 12000.0 uni
DATA4 3000. uni
TBL1 TITLE
VAL1 UNI 0.0 10000.0 30000.0 60000.0 100000.0 300000.0
VAL2 UNI 1858.0 1863.4 1870.3 1876.0 1882.0 1900.0
TBL2 TITLE
VAL1 UNI 6542.0 1156.7 3697.1 4569.3 9564.9 5698.0 9874.7 3654.7 3698.8 2135.0
7894.0 5568.0 7845.3 3657.7
VAL2 UNI 0.0 18300.0 22500.0 99900.0 36900.0 69800.0 58700.0 63520.0 69870.0 55260.0
61100.0 127900.0 166600.0 236900.0
TBL3 TITLE
VAL1 UNI -4876.9 -5642.3 -1225.7 9.0 375.4 322.0 8860.8 1568.1 4567.0 6953.0
6578.1 1236.7 3970.0 5632.3 3265.1 3698.1 1236.2 1236.4 7000.0
VAL2 UNI 3265.1 1236.7 5632.3 2394.1 2405.0 1876.0 7845.3 2420.0 5568.0 2548.0
5632.3 3265.1 2694.1 5568.0 2455.0 5632.3 1863.4 2670.0 2565.0
我一直在阅读
f = open("file.asc")
f = [line.replace('\n','').strip() for line in f]
我遇到的困难是使用不同大小的表,例如TBL1和TBL2。 TBL1很小,所有值都包含在一行中。 TBL2和TBL3的值超过80个字符的限制并延伸到下一行。因此,在我的列表列表中,TBL2 VAL1分为两个列表。
TBL1 = [['TBL1 TITLE'], ['VAL1 UNI', '0.0', '10000.0', '30000.0', '60000.0', '100000.0', '300000.0'],['VAL2 UNI','1858.0', '1863.4', '1870.3', '1876.0', '1882.0', '1900.0']]
TBL2 = [[['TBL2 TITLE'],['VAL1 UNI', '6542.0', '1156.7','3697.1', '4569.3', '9564.9', '5698.0', '9874.7', '3654.7', '3698.8', '2135.0'], ['7894.0', '5568.0', '7845.3', '3657.7'], ['VAL2 UNI', '0.0', '18300.0', '22500.0', '99900.0', '36900.0', '69800.0', '58700.0', '63520.0', '69870.0', '55260.0'],['61100.0', '127900.0', '166600.0', '236900.0']]]
为简洁起见,我遗漏了TBL3
我想要的是
TBL1 = [['TBL1 TITLE'], ['VAL1 UNI', '0.0', '10000.0', '30000.0', '60000.0', '100000.0', '300000.0'],['VAL2 UNI','1858.0', '1863.4', '1870.3', '1876.0', '1882.0', '1900.0']]
TBL2 = [[['TBL2 TITLE'],['VAL1 UNI', '6542.0', '1156.7','3697.1', '4569.3', '9564.9', '5698.0', '9874.7', '3654.7', '3698.8', '2135.0','7894.0', '5568.0', '7845.3', '3657.7'], ['VAL2 UNI', '0.0', '18300.0', '22500.0', '99900.0', '36900.0', '69800.0', '58700.0', '63520.0', '69870.0', '55260.0','61100.0', '127900.0', '166600.0', '236900.0']]]
答案 0 :(得分:2)
如果字母表示新列表的开头,您可以执行以下操作:
from copy import deepcopy # use deepcopy to isolate the result from the
# input list so that the original list will not
# be modified
def combine(lst):
new_list = []
for s in lst:
if s[0].isalpha(): # if the first element contains only letters
new_list.append(deepcopy(s)) # create a new list in the result
else:
new_list[-1].extend(deepcopy(s)) # otherwise append it to the last element
# in the result
return new_list
combine(list1)
#[['title'],
# ['a', '-453.0', '-2913.0', '2983.9', '3476.7', '3970.0'],
# ['b', '23.9', '23.3', '35.0', '40.3', '24.5', '24.2', '24.7', '240.8']]
combine(list2)
# [['title'], ['c', '0.0', '100.0', '300.0'], ['d', '188.0']]
另一种策略是检查每个子列表的第一个元素是否可以转换为float,如果可以的话,那么这应该是最后一个列表的扩展名,否则,它应该是一个新列表:
from copy import deepcopy
def combine(lst):
new_list = []
for s in lst:
try:
float(s[0])
new_list[-1].extend(deepcopy(s))
except ValueError:
new_list.append(deepcopy(s))
return new_list
combine(list1)
#[['title'],
# ['a', '-453.0', '-2913.0', '2983.9', '3476.7', '3970.0'],
# ['b', '23.9', '23.3', '35.0', '40.3', '24.5', '24.2', '24.7', '240.8']]
combine(list2)
# [['title'], ['c', '0.0', '100.0', '300.0'], ['d', '188.0']]