从文件中读取并写入跳过python中某些行的文件

时间:2015-02-16 19:56:44

标签: python

我正在处理的更大代码问题的一小部分:我正在从文件中读取内容: glmfile.glm 。我正在尝试将 glmfile.glm 中的每一行写入另一个文件: combined.glm 跳过符合特定条件的行。

glmfile=open("glmfile.glm",'r').readlines()
combined=open("combined.glm",'w')

glmfile.glm 的内容如下所示:

...
#other objects
object node {
    name SPU123-389-3066753-3_breaker_rec_recnode;
    phases ABC;
    nominal_voltage 7621.024;
}

object node {
    name SPU123-416-25308_recnode;
    phases ABC;
    nominal_voltage 7621.024;
}
object node {
    name SPU123-403-492320;
    groupid from_db;
    phases CN;
    nominal_voltage 7621.024;
}

object node {
    name SPU123-392-97334;
    groupid from_db;
    phases ABCN;
    nominal_voltage 7621.024;
}

object node {
    name SPU123-391-348982;
    groupid from_db;
    phases AN;
    nominal_voltage 7621.024;
}

object node {
    name SPU123-391-542649;
    groupid from_db;
    phases AN;
    nominal_voltage 7621.024;
}
#few more node objects and other objects
...

现在,我形成了一个node_names数组,如下所示:

node_names=['389-3066753','403-492320','392-97334','391-348982']

我正在将 glmfile 的名称与数组中的元素进行比较,以查看对象节点名称是否列在数组node_names中:

for h,aaline in enumerate(glmfile):
    if aaline.startswith('object node {') and ('SWING' not in glmfile[h+3]):
        if glmfile[h+1][13:-2].strip() in node_names:
            #DO NOT WRITE THAT ENTIRE OBJECT NODE SECTION to 'combined'
            #I am able to skip just that line 'glmfile[h]' but how to NOT   
            #write out the entire object node i.e., glmfile[h:h+6]?
            print glmfile[h:h+6]
        else:
            combined.writelines([glmfile[h:h+6]]) 

注意:我遇到的问题在if case评论中的上述代码段中。

2 个答案:

答案 0 :(得分:0)

让我们首先谈谈广泛的术语,并从那里指定。

您的对象看起来像:

object node {
    name NAME_VAL;
    phases PHASES_VAL;
    nominal_voltage VOLTAGE_VAL;
}

你试图从一个充满这些对象的文件写入另一个空白文件,只拍摄这样的对象

'SWING' in PHASES_VAL and NAME_VAL in [some list of nodes].

让我们这样做:

import itertools

def grouper(iterable, n, fillvalue=None)
    '''from https://docs.python.org/3/library/itertools.html'''
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

with open('path/to/infile.txt') as inf, \
        open('path/to/outfile.txt', 'w') as outf:
    objects_in = grouper(inf, 6) # 5 lines of the object plus a newline
    for obj in objects_in:
        obj = list(obj) # let's look at them all at once
        NAME_VAL = obj[1].strip().split()[1]
        PHASES_VAL = obj[2].strip().split()[1]
        VOLTAGE_VAL = obj[3].strip().split()[1]
        if 'SWING' in PHASES_VAL and \
                NAME_VAL in some_list_of_nodes:
            outf.writelines(obj)

那就是说,如果这是你将要反复做的事情,那么为此编写一个解析器可能会更好。

# node.py

class Node(dict):
    '''simply inheriting from dict will handle almost everything,
    but we will have to give it a classmethod to build from string'''

    @classmethod
    def from_string(cls, s):
        kwargs = {}
        reading = False
        for line in s.splitlines():
            if "}" in line:
                return cls.__init__(**kwargs)
            if reading:
                key,val = line.strip().split()
                kwargs[key]=val
            if "{" in line:
                reading = True

    def __str__(self):
        return "object node {\n" + \
                "\n".join("    {} {}".format(item)\
                          for item in self.items()) + "\n}"

# parser.py

from node import Node

def parse(f):
    tokens = []
    token = []
    parsing = False
    for line in inf:
        if line.startswith('object node {'):
            parsing = True
        if parsing:
            token.append(line.strip())
        if line.strip() == "}":
            parsing = False
            obj = Node("\n".join(token))
            tokens.append[obj]
            token = []
    return tokens

# yourfile.py

import parser

with open('path/to/infile.txt') as infile, \
        open('path/to/outfile.txt', 'w') as outfile:
    for obj in parser.parse(infile):
        if obj['name'] in some_node_names_list and \
               'SWING' in obj['phases']:
            outfile.write(str(obj))

答案 1 :(得分:0)

如何使用额外的索引和模运算符:

a%b

例如:

idx = 6
for h,aaline in enumerate(glmfile):
if aaline.startswith('object node {') and ('SWING' not in glmfile[h+3]):
    if glmfile[h+1][13:-2].strip() in node_names or idx%6!=0:
        idx = idx+1
        print glmfile[h:h+6]
        pass
    else:
        combined.writelines([glmfile[h:h+6]])