模式到列表的Python

时间:2015-02-26 03:35:08

标签: python

我有一个像这样的文件

module1 instance1(.wire1 (connectionwire1), .wire2 (connectionwire2),.... ,wire100 (connectionwire100)) ; module 2 instance 2(.wire1 (newconnectionwire1), .wire2 (newconnectionwire2),.... ,wire99 (newconnectionwire99)) 

沿着模块重复电线。可以有很多模块。 我想构建一个这样的字典(不是第二个模块中的每个电线都是重复的)。

[wire1:[(module1, instance1, connection1), (module2, instance2,newconnection1), wire2:[(module1 instance1 connection2),(module2, instance2,newconnection1)]... wire99:module2, instance2, connection99), ]

我在;上拆分字符串然后拆分,然后(以获取连线和连接线字符串。我不知道如何填充数据结构,所以线是关键,模块,instancename和连接是元素。

Goal- get this datastructure- [ wire: (module, instance, connectionwire) ] 

filedata=file.read()
realindex=list(find_pos(filedata,';'))
tempindex=0
for l in realindex:
    module=filedata[tempindex:l]
    modulename=module.split()[0]
    openbracketindex=module.find("(")
    closebracketindex=module.strip("\n").find(");")
    instancename=module[:openbracketindex].split()[1]
    tempindex=l
    tempwires=module[openbracketindex:l+1]
    #got to split wires on commas
    for tempw in tempwires.split(","):
        wires=tempw
        listofwires.append(wires)

2 个答案:

答案 0 :(得分:2)

使用re模块。

import re
from collections import defaultdict

s = "module1 instance1(.wire1 (connectionwire1), .wire2 (connectionwire2), .wire100 (connectionwire100)) ; module2 instance2(.wire1 (newconnectionwire1), .wire2 (newconnectionwire2), wire99 (newconnectionwire99))'

d = defaultdict(list)
module_pattern = r'(\w+)\s(\w+)\(([^;]+)'
mod_rex = re.compile(module_pattern)
wire_pattern = r'\.(\w+)\s\(([^\)]+)'
wire_rex = re.compile(wire_pattern)

for match in mod_rex.finditer(s):
    #print '\n'.join(match.groups())
    module, instance, wires = match.groups()
    for match in wire_rex.finditer(wires):
        wire, connection = match.groups()
        #print '\t', wire, connection
        d[wire].append((module, instance, connection))

for k, v in d.items():
    print k, ':', v

可生产

wire1 : [('module1', 'instance1', 'connectionwire1'), ('module2', 'instance2', 'newconnectionwire1')]
wire2 : [('module1', 'instance1', 'connectionwire2'), ('module2', 'instance2', 'newconnectionwire2')]
wire100 : [('module1', 'instance1', 'connectionwire100')]   

答案 1 :(得分:1)

使用re提供的答案是正确的。我分享了一个如何使用pyparsing模块解决问题的示例,这使得解析人类可读且易于操作。

from pyparsing import Word, alphanums, Optional, ZeroOrMore, Literal, Group, OneOrMore
from collections import defaultdict
s = 'module1 instance1(.wire1 (connectionwire1), .wire2 (connectionwire2), .wire100 (connectionwire100)) ; module2 instance2(.wire1 (newconnectionwire1), .wire2 (newconnectionwire  2), .wire99 (newconnectionwire99))'
connection = Word(alphanums)
wire = Word(alphanums)
module = Word(alphanums)
instance = Word(alphanums)
dot = Literal(".").suppress()
comma = Literal(",").suppress()
lparen = Literal("(").suppress()
rparen = Literal(")").suppress()
semicolon = Literal(";").suppress()
wire_connection = Group(dot + wire("wire") + lparen + connection("connection") + rparen + Optional(comma))
wire_connections = Group(OneOrMore(wire_connection))
module_instance = Group(module("module") + instance("instance") + lparen + ZeroOrMore(wire_connections("wire_connections")) + rparen + Optional(semicolon))
module_instances = OneOrMore(module_instance)
results = module_instances.parseString(s)
# create a dict
d = defaultdict(list)
for r in results:
    m = r['module']
    i = r['instance']
    for wc in r['wire_connections']:
        w = wc['wire']
        c = wc['connection']
        d[w].append((m, i, c)) 
print d

输出:

defaultdict(<type 'list'>, {'wire1': [('module1', 'instance1', 'connectionwire1'), ('module2', 'instance2', 'newconnectionwire1')], 'wire2': [('module1', 'instance1', 'connectionwire2'), ('module2', 'instance2', 'newconnectionwire2')], 'wire100': [('module1', 'instance1', 'connectionwire100')], 'wire99': [('module2', 'instance2', 'newconnectionwire99')]})