Question

我最近再次使用Python几年后没有使用它来做一些没有指定语言要求的学校项目。我当前的项目是使用max-heap结构创建一个简单的优先级队列。我当前的问题是我的输入文件。对于每一行输入，我们给出了几个元组：一个字符串（数据）和一个数字（优先级）。

前：

(R10, 10), (R20, 20), (R90, 90), (R75, 75), (R35, 35), (R60, 60), (R260, 60), (R360, 60)  
(R15, 15)  
(R50, 50)  
(R275, 75)

对于每一行，我们需要将每个元组信息插入到我们的优先级队列堆中并弹出并返回最高优先级，然后对每一行重复。

Ex输出:(它应该是什么）

*insert all into queue* print (R90) *next line*  
*insert into queue* print (R75) *next line*  
*etc*

我对如何将数据准确地转换为可以使用的表单感到困惑。我目前得到了这个。

with open(fileName) as f:
    for line in f.readlines():
        nodes = line.split('), (')

然后返回:(这是我目前最接近的）

['(R10, 10', 'R20, 20', 'R90, 90', 'R75, 75', 'R35, 35', 'R60, 60', 'R260, 60', 'R360, 60', 'R210, 10', 'R5, 5', 'R76, 76', 'R80, 80)\n']  
['(R15, 15)\n']  
['(R50, 50)\n']  
['(R275, 75)\n']

任何帮助将不胜感激，谢谢你提前！

Answer 1

我认为使用正则表达式最容易解决这个问题：

import re
# This regular expression will match '(<group1>,<group2>)'
# with any whitespace between ',' and <group2> being discarded
tuple_pattern = tuple_pattern = re.compile("\(([^,)]+),\s*([^)]+)\)")

with open(fileName) as f:
    # Find all occurences of the tuple pattern in the file, this gives
    # us an array of tuples where each tuple is (<group1>,<group2>)
    # for each match
    tuples = tuple_pattern.findall(f.read())

根据输入数据，结果如下：

>>> tuple_pattern.findall("""(R10, 10), (R20, 20), (R90, 90), (R75, 75), (R35, 35), (R60, 60), (R260, 60), (R360, 60)  
... (R15, 15)  
... (R50, 50)  
... (R275, 75)""")
[('R10', '10'), ('R20', '20'), ('R90', '90'), ('R75', '75'), ('R35', '35'), ('R60', '60'), ('R260', '60'), ('R360', '60'), ('R15', '15'), ('R50', '50'), ('R275', '75')]

编辑：

如果您需要单独处理每一行，请对tuple_pattern.findall(line)

中的每一行执行file.readlines()

import re
tuple_pattern = re.compile("\(([^,)]+),\s*([^)]+)\)")

with open(filename) as f:
    for line in f.readlines():
        print(tuple_pattern.findall(line))

输出：

[('R10', '10'), ('R20', '20'), ('R90', '90'), ('R75', '75'), ('R35', '35'), ('R60', '60'), ('R260', '60'), ('R360', '60')]
[('R15', '15')]
[('R50', '50')]
[('R275', '75')]

Python：逐行读取，在每个终点行执行操作

1 个答案: