Question

我正在从文本文件中读取文本，然后重新格式化该文本以写入不同的文本文件。

我正在阅读的文字如下testFile.txt：

                  *******************************
                  *  Void Fractions in the Bed  *
                  *******************************

     Z(m)    MIN.FLUIDIZ.  EMULSION    TOTAL

0.0000E+00  0.4151E+00  0.8233E+00  0.8233E+00
0.1000E-09  0.4151E+00  0.8233E+00  0.8233E+00
0.1000E-05  0.4151E+00  0.8233E+00  0.8233E+00
0.2000E-05  0.4151E+00  0.8233E+00  0.8233E+00
0.1251E+01  0.4151E+00  0.9152E+00  0.9152E+00
0.1301E+01  0.4151E+00  0.9152E+00  0.9152E+00
0.1333E+01  0.4151E+00  0.9152E+00  0.9152E+00


               *************************************
               *  Void Fractions in the Freeboard  *
               *************************************

     Z(m)    VOID FRACTION

0.1333E+01  0.9992E+00
0.1333E+01  0.9992E+00
0.1333E+01  0.9992E+00
0.1333E+01  0.9992E+00
0.3533E+01  0.9992E+00
0.3633E+01  0.9992E+00
0.3733E+01  0.9992E+00
0.3833E+01  0.9992E+00
0.3933E+01  0.9992E+00
0.4000E+01  0.9992E+00


           *********************************************
           *  Superficial Velocities in the Bed (m/s)  *
           *********************************************

     Z(m)    MIN.FLUIDIZ.  ACTUAL

0.0000E+00  0.1235E+00  0.4911E+01
0.1000E-09  0.1235E+00  0.4911E+01
0.1000E-05  0.1235E+00  0.4911E+01
0.2000E-05  0.1235E+00  0.4911E+01
0.3000E-05  0.1235E+00  0.4911E+01
0.1151E+01  0.1235E+00  0.4915E+01
0.1201E+01  0.1235E+00  0.4915E+01
0.1251E+01  0.1235E+00  0.4915E+01
0.1301E+01  0.1235E+00  0.4915E+01
0.1333E+01  0.1235E+00  0.4915E+01

下面是我解析文本文件的Python代码：

openFile = open('testFile.txt','r')

groupOneFile = open('groupOneFile.csv','w')
groupTwoFile = open('groupTwoFile.csv','w')
groupThreeFile = open('groupThreeFile.csv','w')

idx = 0;
firstIdx = 0;
secondIdx = 0;
thirdIdx = 0;

for line in openFile:

    # first group
    if '*  Void Fractions in the Bed  *' in line:
        print line
        firstIdx = idx

    if idx in range(firstIdx+5,firstIdx+43):
        line = line.lstrip()
        line = line.replace('  ',',')
        groupOneFile.write(line)

    # second group
    if '*  Void Fractions in the Freeboard  *' in line:
        print line
        secondIdx = idx

    if idx in range(secondIdx+5,secondIdx+43):
        line = line.lstrip()
        line = line.replace('  ',',')
        groupTwoFile.write(line)        

    # third group
    if '*  Superficial Velocities in the Bed (m/s)  *' in line:
        print line
        thirdIdx = idx

    if idx in range(thirdIdx+5,thirdIdx+43):
        line = line.lstrip()
        line = line.replace('  ',',')
        groupThreeFile.write(line)

    idx += 1

openFile.close()

groupOneFile.close()
groupTwoFile.close()
groupThreeFile.close()

groupOneFile中应包含以下数据：

0.0000E+00,0.4151E+00,0.8233E+00,0.8233E+00
0.1000E-09,0.4151E+00,0.8233E+00,0.8233E+00
0.1000E-05,0.4151E+00,0.8233E+00,0.8233E+00
0.2000E-05,0.4151E+00,0.8233E+00,0.8233E+00
0.1251E+01,0.4151E+00,0.9152E+00,0.9152E+00
0.1301E+01,0.4151E+00,0.9152E+00,0.9152E+00
0.1333E+01,0.4151E+00,0.9152E+00,0.9152E+00

groupTwoFile应该包含以下内容：

0.1333E+01,0.9992E+00
0.1333E+01,0.9992E+00
0.1333E+01,0.9992E+00
0.1333E+01,0.9992E+00
0.3533E+01,0.9992E+00
0.3633E+01,0.9992E+00
0.3733E+01,0.9992E+00
0.3833E+01,0.9992E+00
0.3933E+01,0.9992E+00
0.4000E+01,0.9992E+00

等groupThreeFile。

读取主文本文件并将数据写入其他文件工作正常。问题是写入groupOneFile的数据也会写入其他文件groupTwoFile和groupThreeFile的开头。我怎样才能防止这种情况发生？

Answer 1

要实现这一点，您可以初始化

firstIdx = 1000000
secondIdx = 1000000
thirdIdx = 1000000

因为问题在于如果将它们设置为0，那么第一行将在所有组的范围内。

但请注意，此代码的效率非常低......更好的方法可能是：

outputFile = None

for line in openFile:
    if '*  Void Fractions in the Bed  *' in line:
        idx = 0; outputFile = groupOneFile
    elif '*  Void Fractions in the Freeboard  *' in line:
        idx = 0; outputFile = groupTwoFile
    elif '*  Superficial Velocities in the Bed (m/s)  *' in line:
        idx = 0; outputFile = groupThreeFile

    if outputFile and 5 <= idx < 43:
        line = line.lstrip()
        line = line.replace('  ',',')
        outputFile.write(line)

    idx = idx + 1

在Python中，如果您编写if x in range(a, b):，则会对每个元素进行检查（或者在Python 2.x中构建一个从a到b-1的所有整数的实际列表）你做了测试。更好的是将测试编写为if a <= x < b:。

另请注意，2.5 in range(0, 10)将返回false（当然0 <= 2.5 < 10为真）。

在Python中没有switch语句，但您可以构建一个调度表：

filemap = [('*  Void Fractions in the Bed  *', groupOneFile),
           ('*  Void Fractions in the Freeboard  *', groupTwoFile),
           ('*  Superficial Velocities in the Bed (m/s)  *', groupThreeFile)]

outputFile = None
for line in openFile:
    for tag, file in filemap:
        if tag in line:
            idx = 0
            outputFile = file
    if outputFile and 5 <= idx < 43:
        outputFile.write(line)
    idx += 1

如果可以完全匹配（而不是in测试），使用字典可以做得更好：

filemap = {'*  Void Fractions in the Bed  *': groupOneFile,
           '*  Void Fractions in the Freeboard  *': groupTwoFile,
           '*  Superficial Velocities in the Bed (m/s)  *': groupThreeFile)}

outputFile = None
for line in openFile:
    f = filemap.get(line.strip())
    if f:
        # Found a new group header, switch output file
        idx = 0
        outputFile = f
    if outputFile and 5 <= idx < 43:
        outputFile.write(line)
    idx += 1

Answer 2

你问我的建议，所以这是

from itertools import groupby, product

groups = {'*  Void Fractions in the Bed  *': 'groupOneFile.csv',
          '*  Void Fractions in the Freeboard  *': 'groupTwoFile.csv',
          '*  Superficial Velocities in the Bed (m/s)  *': 'groupThreeFile.csv'}

fname = None

with open('testFile.txt','r') as fin:
    for k, group in groupby(fin, lambda x:x[0].isspace()):
        if k:
            for i, g in product(group, groups):
                if g in i:
                    fname = groups[g]
                    break
        else:
            with open(fname, 'w') as fout:
                fout.writelines(','.join(s.split())+'\n' for s in group)

Answer 3

secondIdx和thirdIdx从0开始，这意味着if idx in range(secondIdx+5,secondIdx+43):在靠近文件顶部的行上触发。

要解决此问题，您可以重写为更有状态的设置（当您阅读Void Fractions in the Bed时，您写入第一个文件，直到找到新标题等）或者只是初始化您的{{1} }到Idx左右。

Answer 4

with open("testFile.txt") as f:
  lines = list(f)

firstIdx = secondIdx = thirdIdx = None
for x, line in enumerate(lines):
  if "*  Void Fractions in the Bed  *" in line:
    firstIdx = x
  elif "*  Void Fractions in the Freeboard  *" in line:
    secondIdx = x
  elif "*  Superficial Velocities in the Bed (m/s)  *" in line:
    thirdIdx = x

def write_lines(start, end, filename):
  with open(filename, "w") as f:
    for line in lines[start:end]:
      f.write(line.replace("  ", ","))

if firstIdx is not None:
  write_lines(firstIdx + 5, firstIdx + 43, "groupOneFile.csv")
if secondIdx is not None:
  write_lines(secondIdx + 5, secondIdx + 43, "groupTwoFile.csv")
if thirdIdx is not None:
  write_lines(thirdIdx + 5, thirdIdx + 43, "groupThreeFile.csv")

Python write（）函数将以前的数据写入下一个文件

4 个答案: