Question

我正在尝试将工具的输出解析为数据结构，但我遇到了一些困难。该文件如下所示：

 Fruits
   Apple
     Auxiliary
     Core
     Extras
   Banana
     Something
   Coconut
 Vegetables
   Eggplant
   Rutabaga

您可以看到顶级项目缩进一个空格，而下面的项目每个级别缩进两个空格。这些项目也按字母顺序排列。

如何将文件转换为类似["Fruits", "Fruits/Apple", "Fruits/Banana", ..., "Vegetables", "Vegetables/Eggplant", "Vegetables/Rutabaga"]的Python列表？

Answer 1

>>> with open("food.txt") as f:
...     res = []
...     s=[]
...     for line in f:
...         line=line.rstrip()
...         x=len(line)
...         line=line.lstrip()
...         indent = x-len(line)
...         s=s[:indent/2]+[line]
...         res.append("/".join(s))
...     print res
... 
['Fruits', 'Fruits/Apple', 'Fruits/Apple/Auxiliary', 'Fruits/Apple/Core', 'Fruits/Apple/Extras', 'Fruits/Banana', 'Fruits/Banana/Something', 'Fruits/Coconut', 'Vegetables', 'Vegetables/Eggplant', 'Vegetables/Rutabaga']

Answer 2

所以你不希望最深层次对吗？我不知道我是否认为你是正确的，但不过，这是一种方法

d=[]
for line in open("file"):
    if not line.startswith("    "):
         if line.startswith("  "):
             d.append(p+"/"+line.strip())
         elif line.startswith(" "):
             p=line.rstrip()

输出

$ ./python.py
[' Fruits/Apple', ' Fruits/Banana', ' Fruits/Coconut', ' Vegetables/Eggplant', ' Vegetables/Rutabaga']

Answer 3

这假设您的输入文件是'datafile.txt'，您只使用空格来缩进，您指定每个级别的indent_string并且您的级别0开始时没有任何缩进（最低缩进上没有空格）。所有这些约束都可以轻松消除。但基本布局应该清楚：

import re

indent_string = '  '
pattern = re.compile('(?P<blanks>\s*)(?P<name>.*)')


f = open('datafile.txt')

cache={}

for line in f:
  m = pattern.match(line)
  d = m.groupdict()
  level = len(d['blanks']) / len(indent_string)
  cache.update({level: d['name']})
  s = ''
  for i in xrange(level+1):
    s += '/' + cache[i]
  print s

Answer 4

你可以这样做：

builder, outlist = [], []
current_spacing = 0

with open('input.txt') as f:
    for line in f:
        stripped = line.lstrip()
        num_spaces = len(line) - len(stripped)
        if num_spaces == current_spacing:
            builder.pop()
        elif num_spaces < current_spacing:
            for i in xrange(current_spacing - num_spaces):
                builder.pop()
        builder.append(stripped)
        current_spacing = num_spaces
        outlist.append("/".join(builder))

print outlist

在Python中解析具有分层结构的文件

4 个答案: