Question

我正在用Python编写一个小型文本解析器，并试图将行解析为列表。我正在逐行解析文件。

大多数数据是平坦的，但有些数据具有以下结构：

1, -1
    2, 2, 0, -33017.1, 21011.3, 97.6, 0, 1, 1, 1, 0, 1, 0
    2, -1, 0, -36936.3, 21672.3, 96.6, 0, 1, 1, 1, 0, 1, 0
    2, 3, 0, -33220.8, 21150.6, 96.6, 0, 1, 1, 1, 0, 1, 0
    2, 4, 0, -33515.6, 21272.7, 96.6, 0, 1, 1, 1, 0, 1, 0
    2, 5, 0, -33832, 21314.3, 96.6, 0, 1, 1, 1, 0, 1, 0
    2, 6, 0, -35112, 21314.3, 96.6, 0, 1, 1, 1, 2, 1, 0
    2, 7, 0, -36072, 21314.3, 96.6, 0, 1, 1, 1, 0, 1, 0
    2, 8, 0, -36388.3, 21356, 96.6, 0, 1, 1, 1, 0, 1, 0
    2, 1, 0, -36683.1, 21478.1, 96.6, 0, 1, 1, 1, 0, 1, 0
    1, 0, 0, -32888.9, 20917.9, 99, 0, 1, 1, 1, 0, 1, 0
    1, 1, 0, -37066, 21772.1, 96.6008, 0, 1, 1, 1, 0, 1, 0
    0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0

我想对其进行解析，以使结果列表看起来像这样：

[1, -1, [[2, 2, 0, -33017.1, 21011.3, 97.6, 0, 1, 1, 1, 0, 1, 0], ...]]

该部分只有一个缩进级别。

到目前为止，我尝试过的大多数技术都感觉很粗糙-例如遍历缩进的行数以获得计数，然后使用索引切片，然后再次遍历。有更优雅的解决方案吗？

Answer 1

您是否尝试过类似for line in file，然后尝试if line.startswith('\t')或其他空白字符的方法？然后，您可以将data.append(line.split())添加到列表data.extend(line.split())

data = []
for line in file:
    if line.startswith(' '):
        data.append([item.strip() for item in line.split(',')])
    else:
        data.extend([item.strip() for item in line.split(',')])

（目前无法访问python，因此未经测试）关于原始问题，您可能希望将item.strip()替换为int(item.strip())或float(item.strip())，因为您希望结果为数字

data = []
startindent = False
for line in file:
    if startindent:
        if line.startswith(' '):
            indent.append([item.strip() for item in line.split(',')])
        else: # if we were previously in 'indent' mode but this line isn't, add our collected data and add this line as normal
            startindent = False
            data.append(indent)
            data.extend([item.strip() for item in line.split(',')])
        continue

    if line.startswith(' '):
        startindent = True
        # .. signify that the line is indented
        indent = [] # list to hold indented text
        indent.append([item.strip() for item in line.split(',')]) # add the current line to this indented list
    else:
        startindent = False
        data.extend([item.strip() for item in line.split(',')]) # otherwise add the items as normal

真的不确定这是否行得通，但我试图使它将连续的缩进行分组为列表列表

解析缩进字符串的优雅方法

1 个答案: