我正在从文本文件中读取文本,然后重新格式化该文本以写入不同的文本文件。
我正在阅读的文字如下testFile.txt
:
*******************************
* Void Fractions in the Bed *
*******************************
Z(m) MIN.FLUIDIZ. EMULSION TOTAL
0.0000E+00 0.4151E+00 0.8233E+00 0.8233E+00
0.1000E-09 0.4151E+00 0.8233E+00 0.8233E+00
0.1000E-05 0.4151E+00 0.8233E+00 0.8233E+00
0.2000E-05 0.4151E+00 0.8233E+00 0.8233E+00
0.1251E+01 0.4151E+00 0.9152E+00 0.9152E+00
0.1301E+01 0.4151E+00 0.9152E+00 0.9152E+00
0.1333E+01 0.4151E+00 0.9152E+00 0.9152E+00
*************************************
* Void Fractions in the Freeboard *
*************************************
Z(m) VOID FRACTION
0.1333E+01 0.9992E+00
0.1333E+01 0.9992E+00
0.1333E+01 0.9992E+00
0.1333E+01 0.9992E+00
0.3533E+01 0.9992E+00
0.3633E+01 0.9992E+00
0.3733E+01 0.9992E+00
0.3833E+01 0.9992E+00
0.3933E+01 0.9992E+00
0.4000E+01 0.9992E+00
*********************************************
* Superficial Velocities in the Bed (m/s) *
*********************************************
Z(m) MIN.FLUIDIZ. ACTUAL
0.0000E+00 0.1235E+00 0.4911E+01
0.1000E-09 0.1235E+00 0.4911E+01
0.1000E-05 0.1235E+00 0.4911E+01
0.2000E-05 0.1235E+00 0.4911E+01
0.3000E-05 0.1235E+00 0.4911E+01
0.1151E+01 0.1235E+00 0.4915E+01
0.1201E+01 0.1235E+00 0.4915E+01
0.1251E+01 0.1235E+00 0.4915E+01
0.1301E+01 0.1235E+00 0.4915E+01
0.1333E+01 0.1235E+00 0.4915E+01
下面是我解析文本文件的Python代码:
openFile = open('testFile.txt','r')
groupOneFile = open('groupOneFile.csv','w')
groupTwoFile = open('groupTwoFile.csv','w')
groupThreeFile = open('groupThreeFile.csv','w')
idx = 0;
firstIdx = 0;
secondIdx = 0;
thirdIdx = 0;
for line in openFile:
# first group
if '* Void Fractions in the Bed *' in line:
print line
firstIdx = idx
if idx in range(firstIdx+5,firstIdx+43):
line = line.lstrip()
line = line.replace(' ',',')
groupOneFile.write(line)
# second group
if '* Void Fractions in the Freeboard *' in line:
print line
secondIdx = idx
if idx in range(secondIdx+5,secondIdx+43):
line = line.lstrip()
line = line.replace(' ',',')
groupTwoFile.write(line)
# third group
if '* Superficial Velocities in the Bed (m/s) *' in line:
print line
thirdIdx = idx
if idx in range(thirdIdx+5,thirdIdx+43):
line = line.lstrip()
line = line.replace(' ',',')
groupThreeFile.write(line)
idx += 1
openFile.close()
groupOneFile.close()
groupTwoFile.close()
groupThreeFile.close()
groupOneFile
中应包含以下数据:
0.0000E+00,0.4151E+00,0.8233E+00,0.8233E+00
0.1000E-09,0.4151E+00,0.8233E+00,0.8233E+00
0.1000E-05,0.4151E+00,0.8233E+00,0.8233E+00
0.2000E-05,0.4151E+00,0.8233E+00,0.8233E+00
0.1251E+01,0.4151E+00,0.9152E+00,0.9152E+00
0.1301E+01,0.4151E+00,0.9152E+00,0.9152E+00
0.1333E+01,0.4151E+00,0.9152E+00,0.9152E+00
groupTwoFile
应该包含以下内容:
0.1333E+01,0.9992E+00
0.1333E+01,0.9992E+00
0.1333E+01,0.9992E+00
0.1333E+01,0.9992E+00
0.3533E+01,0.9992E+00
0.3633E+01,0.9992E+00
0.3733E+01,0.9992E+00
0.3833E+01,0.9992E+00
0.3933E+01,0.9992E+00
0.4000E+01,0.9992E+00
等groupThreeFile
。
读取主文本文件并将数据写入其他文件工作正常。问题是写入groupOneFile
的数据也会写入其他文件groupTwoFile
和groupThreeFile
的开头。我怎样才能防止这种情况发生?
答案 0 :(得分:1)
要实现这一点,您可以初始化
firstIdx = 1000000
secondIdx = 1000000
thirdIdx = 1000000
因为问题在于如果将它们设置为0
,那么第一行将在所有组的范围内。
但请注意,此代码的效率非常低......更好的方法可能是:
outputFile = None
for line in openFile:
if '* Void Fractions in the Bed *' in line:
idx = 0; outputFile = groupOneFile
elif '* Void Fractions in the Freeboard *' in line:
idx = 0; outputFile = groupTwoFile
elif '* Superficial Velocities in the Bed (m/s) *' in line:
idx = 0; outputFile = groupThreeFile
if outputFile and 5 <= idx < 43:
line = line.lstrip()
line = line.replace(' ',',')
outputFile.write(line)
idx = idx + 1
在Python中,如果您编写if x in range(a, b):
,则会对每个元素进行检查(或者在Python 2.x中构建一个从a
到b-1
的所有整数的实际列表)你做了测试。更好的是将测试编写为if a <= x < b:
。
另请注意,2.5 in range(0, 10)
将返回false(当然0 <= 2.5 < 10
为真)。
在Python中没有switch
语句,但您可以构建一个调度表:
filemap = [('* Void Fractions in the Bed *', groupOneFile),
('* Void Fractions in the Freeboard *', groupTwoFile),
('* Superficial Velocities in the Bed (m/s) *', groupThreeFile)]
outputFile = None
for line in openFile:
for tag, file in filemap:
if tag in line:
idx = 0
outputFile = file
if outputFile and 5 <= idx < 43:
outputFile.write(line)
idx += 1
如果可以完全匹配(而不是in
测试),使用字典可以做得更好:
filemap = {'* Void Fractions in the Bed *': groupOneFile,
'* Void Fractions in the Freeboard *': groupTwoFile,
'* Superficial Velocities in the Bed (m/s) *': groupThreeFile)}
outputFile = None
for line in openFile:
f = filemap.get(line.strip())
if f:
# Found a new group header, switch output file
idx = 0
outputFile = f
if outputFile and 5 <= idx < 43:
outputFile.write(line)
idx += 1
答案 1 :(得分:1)
你问我的建议,所以这是
from itertools import groupby, product
groups = {'* Void Fractions in the Bed *': 'groupOneFile.csv',
'* Void Fractions in the Freeboard *': 'groupTwoFile.csv',
'* Superficial Velocities in the Bed (m/s) *': 'groupThreeFile.csv'}
fname = None
with open('testFile.txt','r') as fin:
for k, group in groupby(fin, lambda x:x[0].isspace()):
if k:
for i, g in product(group, groups):
if g in i:
fname = groups[g]
break
else:
with open(fname, 'w') as fout:
fout.writelines(','.join(s.split())+'\n' for s in group)
答案 2 :(得分:0)
secondIdx
和thirdIdx
从0开始,这意味着if idx in range(secondIdx+5,secondIdx+43):
在靠近文件顶部的行上触发。
要解决此问题,您可以重写为更有状态的设置(当您阅读Void Fractions in the Bed
时,您写入第一个文件,直到找到新标题等)或者只是初始化您的{{1} }到Idx
左右。
答案 3 :(得分:0)
with open("testFile.txt") as f:
lines = list(f)
firstIdx = secondIdx = thirdIdx = None
for x, line in enumerate(lines):
if "* Void Fractions in the Bed *" in line:
firstIdx = x
elif "* Void Fractions in the Freeboard *" in line:
secondIdx = x
elif "* Superficial Velocities in the Bed (m/s) *" in line:
thirdIdx = x
def write_lines(start, end, filename):
with open(filename, "w") as f:
for line in lines[start:end]:
f.write(line.replace(" ", ","))
if firstIdx is not None:
write_lines(firstIdx + 5, firstIdx + 43, "groupOneFile.csv")
if secondIdx is not None:
write_lines(secondIdx + 5, secondIdx + 43, "groupTwoFile.csv")
if thirdIdx is not None:
write_lines(thirdIdx + 5, thirdIdx + 43, "groupThreeFile.csv")