Question

我正在尝试使用python来解析文本文件（存储在var trackList中），其中包含时间和标题，看起来像这样

00:04:45 example text
00:08:53 more example text
12:59:59 the last bit of example text

我的正则表达式（rem）工作，我也能够正确地将字符串（i）分成两部分（如在单独的时间和文本中）但我无法再添加数组（使用.extend） split返回到我之前创建的一个大数组（sLines）。

f=open(trackList)
count=0
sLines=[[0 for x in range(0)] for y in range(34)]   
line=[]

for i in f:
    count+=1
    line.append(i)
    rem=re.match("\A\d\d\:\d\d\:\d\d\W",line[count-1])
    if rem:
        sLines[count-1].extend(line[count-1].split(' ',1))
    else:
        print("error on line: "+count)

该代码应遍历文件trackList中的每一行，测试以查看该行是否符合预期，如果这样将时间与文本分开并将其结果保存为索引为1的数组内的数组小于当前行号，如果没有打印错误指向我的行

我使用array[count-1]因为python数组是零索引而文件行不是。

我使用.extend()，因为我希望较小数组的两个元素在父for循环的同一次迭代中添加到较大的数组中。

Answer 1

所以，那里有一些相当混乱的代码。

例如：

[0 for x in range(0)]

初始化空列表是一种非常奇特的方式：

>>> [] == [0 for x in range(0)]
True

另外，你怎么知道得到34行长的矩阵？你也在混淆自己在你的for循环中调用你的'i'行，这通常被保留为索引的简写语法，你期望它是一个数值。当你已经拥有行变量（i）时，将i追加到行然后重新引用它作为行[count-1]是多余的。

您的整体代码可以简化为：

# load the file and extract the lines
f = open(trackList)
lines = f.readlines()
f.close()

# create the expression (more optimized for loops)
expr   = re.compile('^(\d\d:\d\d:\d\d)\s*(.*)$')
sLines = []

# loop the lines collecting both the index (i) and the line (line)
for i, line in enumerate(lines):
    result = expr.match(line)

    # validate the line
    if ( not result ):
        print("error on line: " + str(i+1))
        # add an invalid list to the matrix
        sLines.append([])  # or whatever you want as your invalid line
        continue

    # add the list to the matrix
    sLines.append(result.groups())

用拆分写入多维数组

1 个答案: