我刚刚为Python中的一个类完成了一个作业,它运行正常,我对它很满意,但它看起来很难看!我已经提交了这段代码,因为我们没有标记它的外观,但是它运行正常。关于如何将字符串转换为未来项目的数据集,我不介意一些提示和指示。
输入是由节点和边组成的网格,例如:
"4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
“:”之前的第一个数字是网格的大小(4x4),(1,2; 4)表示从节点1到2的边缘,成本为4.以下代码将此转换为数组所在的数组[0]是网格大小,数组[1]是格式化的字典,如(node1,node2)= cost。
def partitionData(line):
finalDic = dict()
#partition the data around the formating
line = line.split(":")
line[1] = line[1].split("),(")
#clean up data some more
line[1][0] = line[1][0][1:]
end = len(line[1])-1
line[1][end] = line[1][end][:len(line[1][end])-2]
#simplify data and organize into a list
for i in range(len(line[1])):
line[1][i] = line[1][i].split(",")
line[1][i][1] = line[1][i][1].split(";")
#clean up list
for j in range(len(line[1][i])):
line[1][i].append(line[1][i][1][j])
del line[1][i][1]
#convert everything to integer to simplify algorithm
for i in range(len(line[1])):
for j in range(len(line[1][i])):
line[1][i][j] = int(line[1][i][j])
line[0] = int(line[0])
newData = dict()
for i in range(len(line[1])):
newData[(line[1][i][0],line[1][i][1])] = line[1][i][2]
line[1] = newData
for i in line[1]:
if not ((min(i),max(i)) in finalDic):
finalDic[(min(i),max(i))] = line[1][i]
else:
print "There is a edge referenced twice!"
exit()
line[1] = finalDic
return line
起初我有一些更清洁的东西,但它没有考虑到数字可能大于9.我认为这非常难看,必须有一个更漂亮的方法来做到这一点。
答案 0 :(得分:2)
import re
# regular expression for matching a (node1,node2;cost)
EDGE = re.compile(r'\((\d+),(\d+);(\d+)\)')
def parse(s):
# Separate size from the list of edges
size, edges = s.split(':')
# Build a dictionary
edges = dict(
# ...where key is (node1,node2) and value is (cost)
# (all converted to integers)
((int(node1),int(node2)),int(cost))
# ...by iterating the edges using the regular expression
for node1,node2,cost in EDGE.findall(edges))
return int(size),edges
示例:
>>> test = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
>>> parse(test)
(4, {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10): 2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7})
答案 1 :(得分:1)
import re
data = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
temp = data.split(":") # split into grid size and rest
array = [int(temp[0]),{}] # first item: grid size
# split the rest of the string (from the second to the second-to-last characters)
# along the delimiters ");("
for item in temp[1][1:-1].split("),("):
numbers = re.split("[,;]", item) # split item along delimiters , or ;
k1, k2, v = (int(num) for num in numbers) # and convert to int
array[1][(k1,k2)] = v # populate the array
print array
结果
[4, {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10):2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7}]
答案 2 :(得分:0)
您需要的是一个简单的解析器。您的输入可以按以下Extended-BNF表示法显示:
input := NUM ':' edge_defn*
edge_defn := '(' NUM ',' NUM ';' NUM )
NUM := [0-9]+
然后,您可以编写自己的自上而下的解析器或使用解析器生成器(例如ANTLR或yacc / bison)。
让我们一起编写自己的解析器。您首先需要在输入中识别令牌。到目前为止只有令牌是:):,;和数字。我们可以使用Python的split()方法,如Peter Norvig的Python中的Lisp:
input = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
tokens = input.replace(':', ' : ').replace(')',' ) ').replace('(',' ( ').replace(',',' , ').replace(';', ' ; ').split()
我知道,这看起来也很难看,但这是我们使用这种黑客的唯一地方。我们正在做的只是在符号周围放置空格并使用split方法获取所有标记的列表。
接下来我们需要一个get_token函数,由于edge_defn,我们需要向前看另一个令牌。这就是全局look_ahead变量的原因。
look_ahead = None
def next_token(t):
global look_ahead
if look_ahead:
temp = look_ahead
try:
look_ahead = t.next()
except StopIteration:
look_ahead = None
return temp
然后从BNF表示法,我们将为定义的左侧编写函数。
def match(t, tok):
if next_token(t) != tok:
print "Syntax error! Expecting: ", tok
exit()
def read_num(t):
return int(next_token(t))
def edge_defn(t):
match(t, '(')
a = read_num(t)
match(t, ',')
b = read_num(t)
match(t, ';')
c = read_num(t)
print "%d,%d = %d" % (a,b,c) # ..do whatever here..
match(t, ')')
def input(t):
global grid_size
grid_size = read_num(t)
match(t, ':')
while True:
edge_defn(t)
if look_ahead:
match(t, ',')
else:
return
t = tokenizer()
look_ahead = t.next()
input(t)
在调用第一个规则(输入)之后,将解析输入并在您执行操作的路上。虽然这本身就是一个很好的练习,但最好使用解析器生成器,但我不确定它是否会被接受。 (取决于作业的目的。)
答案 3 :(得分:0)
这是一种不同的方法,它利用了边缘列表看起来很像一堆元组的事实。在实践中,我可能已经完成了shang所做的事情,但已经完成了:
import ast
def build_graph(line):
size, content = line.split(':')
size = int(size)
content = content.replace(';',',')
edges = ast.literal_eval(content)
d = {}
for v0, v1, cost in edges:
pair = tuple(sorted([v0, v1]))
if pair not in d:
d[pair] = cost
else:
print "There is an edge referenced twice!"
return
return [size, d]
>>> line = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
>>> build_graph(line)
[4, {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10): 2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7}]
与往常一样,当你关心错误处理和拒绝无效输入时,真正的麻烦就出现了,所以我将完全忽略这个问题。 :^)但是literal_eval是一个很有用的小函数,并且没有直接“eval”的危险。
答案 4 :(得分:0)
已提出的解决方案使用
我认为这个问题是以初学者可以理解的最简单的方式来实现的。 此外,我的解决方案显示Python内置功能足以完成这项工作。
首先,我将您的代码更正为WhiteDawn,以便您能够看到必须理解的基本要点,可以使用Python的特性简化它们。
例如, seq 是一个序列, seq [len(seq)-1] 是它的最后一个元素,但 seq [-1] 也是最后一个元素。 顺便说一下,代码中有一个错误:我认为它是
line[1][end] = line[1][end][:len(line[1][end])-1]
# not:
line[1][end] = line[1][end][:len(line[1][end])-2]
否则执行期间出错
另请注意强大的功能 enumerate()
你必须研究清单的切片:如果 li = [45,12,78,96] 那么 li [2:3] = [2,5,8] < / strong>将 li 转换为 li = [45,12,2,5,8,96]
y = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
def partitionData(line):
finalDic = dict()
#partition the data around the formating
print 'line==',line
line = line.split(":")
print '\ninstruction : line = line.split(":")'
print 'line==',line
print 'len of line==',len(line),' (2 strings)'
print '---------------------'
line[1] = line[1].split("),(")
print '\ninstruction : line[1] = line[1].split("),(")'
print 'line[1]==',line[1]
#clean up data some more
line[1][0] = line[1][0][1:]
print 'instruction : line[1][0] = line[1][0][1:]'
line[1][-1] = line[1][-1][0:-1]
print 'instruction : line[1][-1] = line[1][-1][0:-1]'
print 'line[1]==',line[1]
print '---------------------'
#simplify data and organize into a list
for i,x in enumerate(line[1]):
line[1][i] = x.split(",")
line[1][i][1:] = line[1][i][1].split(";")
print 'loop to clean the data in line[1]'
print 'line[1]==',line[1]
print '---------------------'
#convert everything to integer to simplify algorithm
print 'convert everything to integer to simplify algorithm'
for i,x in enumerate(line[1]):
line[1][i] = map(int,x)
line[0] = int(line[0])
print 'line==',line
print '---------------------'
newData = dict()
for a,b,c in line[1]:
newData[(a,b)] = c
line[1] = newData
print 'line==',line
print '---------------------'
for i in line[1]:
print 'i==',i,' (min(i),max(i))==',(min(i),max(i))
if not ((min(i),max(i)) in finalDic):
finalDic[(min(i),max(i))] = line[1][i]
else:
print "There is a edge referenced twice!"
exit()
line[1] = finalDic
print '\nline==',line
return line
print partitionData(y)
其次,我的解决方案:
y = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
# line[1]== {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10): 2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7}
def partitionData(line):
finalDic = dict()
#partition the data around the formating
print '\nline==',line
line = line.split(":")
print '\ninstruction:\n line = line.split(":")'
print 'result:\n line==',line
print '\n----------------------------------------------------'
print '\nline[1]==',line[1]
line[1] = line[1][1:-1].replace(";",",")
print '\ninstruction:\n line[1] = line[1][1:-1].replace(";",",")'
print 'result:\n line[1]==',line[1]
line[1] = [ x.split(",") for x in line[1].split("),(") ]
print '\ninstruction:\n line[1] = [ x.split(",") for x in line[1].split("),(") ]'
print 'result:\n line[1]==',line[1]
line = [int(line[0]),dict( ((int(a),int(b)),int(c)) for (a,b,c) in line[1] ) ]
print '\ninstruction:\n line = [int(line[0],dict( ((int(a),int(b)),int(c)) for (a,b,c) in line[1] ) ]'
print 'result:\n line[1]==',line[1]
for i in line[1]:
if not ((min(i),max(i)) in finalDic):
finalDic[(min(i),max(i))] = line[1][i]
else:
print "There is a edge referenced twice!"
exit()
line[1] = finalDic
print '\nline[1]==',line[1]
return line
print partitionData(y)
我让FinalDict的结尾不受影响,因为我不明白这个片段是什么。 如果 i 是几个整数,(min(i),max(i))就是这对夫妇本身