Python - 为单个文件的每个部分编写单独的文件

时间:2017-05-23 23:25:13

标签: python python-2.7 parsing

我有一个包含5个数据部分的.txt文件。每个部分都有一个标题行“Section X”。我想从这个单独的文件中解析并编写5个单独的文件。该部分将从标题开始,并在下一个标题标题之前结束。下面的代码创建5个单独的文件;但是,它们都是空白的。

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2",
    "Section 3", "Section 4", "Section 5"]

with open(filename+".txt", "rb") as oldfile:
    for i in dimensionsList:
        licycle = cycle(dimensionsList)
        nextelem = licycle.next()
        with open(i+".txt", "w") as newfile: 
            for line in oldfile:
                if line.strip() == i:
                    break
            for line in oldfile:
                if line.strip() == nextelem:
                    break
                newfile.write(line)

1 个答案:

答案 0 :(得分:1)

问题

测试你的代码,它只适用于第1部分(其他人对我来说也是空白的)。我意识到问题是Sections之间的转换(以及licycle在所有迭代中重新启动)。

第二节是在第二个forif line.strip() == nextelem)阅读的。下一行是第2节的数据(而不是文本Section 2)。

语言很难,但请测试下面的代码:

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
                  "Section 5"]

with open(filename + ".txt", "rb") as oldfile:
    licycle = cycle(dimensionsList)
    nextelem = licycle.next()
    for i in dimensionsList:
        print(nextelem)
        with open(i + ".txt", "w") as newfile:
            for line in oldfile:
                print("ignoring %s" % (line.strip()))
                if line.strip() == i:
                    nextelem = licycle.next()
                    break
            for line in oldfile:
                if line.strip() == nextelem:
                    # nextelem = licycle.next()
                    print("ignoring %s" % (line.strip()))
                    break
                print("printing %s" % (line.strip()))
                newfile.write(line)
            print('')

它将打印:

Section 1
ignoring Section 1
printing aaaa
printing bbbb
ignoring Section 2

Section 2
ignoring ccc
ignoring ddd
ignoring Section 3
ignoring eee
ignoring fff
ignoring Section 4
ignoring ggg
ignoring hhh
ignoring Section 5
ignoring iii
ignoring jjj

Section 2

Section 2

Section 2

它适用于第1部分,它检测第2部分,但它一直忽略这些行,因为它找不到"第2节"。

如果每次重新启动行(总是从第1行开始),我认为该程序可以正常工作。但我做了一个更简单的代码,应该适合你。

解决方案

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
                  "Section 5"]

with open(filename + ".txt", "rb") as oldfile:

    licycle = cycle(dimensionsList)
    nextelem = licycle.next()
    newfile = None
    line = oldfile.readline()

    while line:

        # Case 1: Found new section
        if line.strip() == nextelem:
            if newfile is not None:
                newfile.close()
            nextelem = licycle.next()
            newfile = open(line.strip() + '.txt', 'w')

        # Case 2: Print line to current section
        elif newfile is not None:
            newfile.write(line)

        line = oldfile.readline()

如果找到Section,它就会开始写这个新文件。否则,继续写入当前文件。

Ps。:下面是我使用的示例文件:

Section 1
aaaa
bbbb
Section 2
ccc
ddd
Section 3
eee
fff
Section 4
ggg
hhh
Section 5
iii
jjj