从文本文件的每一行提取字符串,并将输出保存在csv行中

时间:2019-06-12 08:20:27

标签: python

我正在尝试从文本文件中提取以下数据srcintf,dstintf,srcaddr,dstaddr,action,schedule,service,logtraffic,并将值保存到具有正确行的csv文件中。

输入文件如下:

edit 258
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "all"
    set dstaddr "10.2.22.1/32"
    set action accept
    set schedule "always"
    set service "selling_soft_01"
    set logtraffic all
next
edit 184
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "10.1.1.1/32"
    set schedule "always"
    set service "HTTPS"
    set logtraffic all
next
edit 124
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "172.16.77.1/32"
    set schedule "always"
    set service "ping"
    set logtraffic all
    set nat enable
next

这是我的第一次编程(从代码中可以看到),但是也许您可以了解有关我正在尝试执行的操作的更多信息。参见下面的代码。

import csv

text_file = open("fwpolicy.txt", "r")

lines = text_file.readlines()

mycsv = csv.writer(open('output.csv', 'w'))

mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat'])

n = 0
for line in lines: 
    n = n + 1
n = 0
for line in lines: 
    n = n + 1
    if "set srcintf" in line:
            srcintf = line
    else    srcintf = 'not set'
    if "set dstintf" in line:            
        dstintf = line
    else    dstintf  = 'not set'
    if "set srcaddr" in line:           
        srcaddr = line
    else    srcaddr = 'not set'
    if "set dstaddr" in line:
            dstaddr = line
    else    dstaddr = 'not set'
    if "set action" in line:            
        action = line
    else    action = 'not set'
    if "set schedule" in line:
            schedule = line
    else    schedule = 'not set'
    if "set service" in line:
            service = line
    else    service = 'not set'
    if "set logtraffic" in line:
            logtraffic = line
    else    logtraffic = 'not set'
    if "set nat" in line:
            nat = line
    else    nat = 'not set'            

        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])

预期结果(CSV文件):

srcintf,dstintf,srcaddr,dstaddr,schedule,service,logtraffic,nat
"Untrust","Trust","all","10.2.22.1/32","always","selling_soft_01",all,,

实际结果:

Traceback (most recent call last):
  File "parse.py", line 45, in <module>
    mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
NameError: name 'srcintf' is not defined

4 个答案:

答案 0 :(得分:1)

您正在尝试为文件中的每一行向csv写一行。 您应该仅在看到单词next时才写该行,因此请在写之前进行检查,以完全收集每一行的条件。

到此为止,您会注意到您已将值设置为整行,而不是字符串后的所需值。 例如与线

 set srcintf "Untrust"

您的代码

 if "set srcintf" in line: srcintf = line
 else srcintf = 'not set' 

将为srcintf赋予值set srcintf "Untrust"。尝试split字符串以找到实际值吗?

...类似这样:

text_file = open("fwpolicy.txt", "r")
lines = text_file.readlines()
mycsv = csv.writer(open('output.csv', 'w'))
mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                'service', 'logtraffic', 'nat'])
for line in lines:
    if "edit" in line:
        [srcintf, dstintf, srcaddr, dstaddr, schedule,
         service, logtraffic, nat] = ['not set']*8
    elif 'next' in line:
        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
    elif "set srcintf" in line:
         srcintf = line.split()[2]
    elif "set dstintf" in line:            
         dstintf = line.split()[2]
    elif "set srcaddr" in line:           
         srcaddr = line.split()[2]
    elif "set dstaddr" in line:
        dstaddr = line.split()[2]
    elif "set action" in line:            
        action = line.split()[2]
    elif "set schedule" in line:
        schedule = line.split()[2]
    elif "set service" in line:
        service = line.split()[2]
    elif "set logtraffic" in line:
        logtraffic = line.split()[2]
    elif "set nat" in line:
        nat = line.split()[2]

重要的是填充一行中的所有值,并且只有在拥有它们时才进行写。 可以使重复变得更整洁,但是希望这对状态机的想法有所帮助-查看文件中的位置,以确定是收集值,开始新手还是写一行。

答案 1 :(得分:1)

如何使用DictWriter

with open("fwpolicy.txt", "r") as text_file, open('output.csv', 'w', newline='') as out_file:

    fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                  'service', 'logtraffic', 'nat']

    mycsv = csv.DictWriter(out, fieldnames=fieldnames, extrasaction='ignore',
                           quotechar=None, quoting=csv.QUOTE_NONE)
    mycsv.writeheader()

    row = {}
    for line in text_file:
        words = line.strip().split(maxsplit=2)
        if 'set' == words[0]:
            row[words[1]] = words[2]
        elif 'next' == words[0]:
            print(row)
            mycsv.writerow(row)
            row = {}

答案 2 :(得分:0)

这是我的处理方法:

import csv
text_file = open("structured_content.txt", "r")
lines = "\n".join(text_file.readlines())
fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']

defaults = {'srcintf' : "not set", 'dstintf': "not set", 'srcaddr': "not set", 
            'dstaddr': "not set", 'schedule': "not set", 'service': "not set", 
            'logtraffic': "not set", 'nat': "not set"}

mycsv = csv.DictWriter(open('output.csv', 'w'), fieldnames)
for block in lines.split("next"):
    csv_row = {}
    for p in [(s.strip()) for s in block.replace("\n", "").split("set")]:
        s = p.split()
        if len(s)==2:
            csv_row[s[0]]=s[1]  # n.b. this includes "action" and "edit" fields, which need stripping out
            csv_write_row = {}
            for k,v in csv_row.items():
                print ( "key=",k,"value=",v )
                if k in fieldnames: # a filter to only include fields in the "fieldnames" list
                    print ( k , " is in the list - attach its value to the output dictionary")
                    csv_write_row[k]=v
            for k,v in defaults.items(): 
                if k not in csv_write_row.keys(): # pad-out the output row with any default values not lifted from the file
                    print ( k , " is not in the list - write a default out")
                    csv_write_row[k]=v
    mycsv.writerow(csv_write_row)

我的目标是利用文件的结构,并使用split命令将文本字符串分解为重复的块。将文件转换为csv只是将块(和嵌套块)对齐为csv格式的问题。 csv.DictWriter提供了一个有用的界面,用于逐行保存您的内容。

如果您要为不存在的值设置默认值,则可以使用包含字段名称键和默认(缺失)值的字典来实现。如果不存在这些默认值,则可以用这些默认值“清洗”准备好的csv_write_row。

答案 3 :(得分:0)

这是一种实现方法:

keys = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']
lines
records = []
for line in lines:

    found_key = [key for key in keys if key in line]

    if len(found_key) >0:
        value = line.strip().rstrip("\n\r").replace('"', '').split(" ")[2: ]
        record[found_key[0]] = value[0]

    if 'next' in line:
        records.append(record)
        record = dict()

pd.DataFrame(records).to_csv('output.csv', index=False)