Question

我在文件中有以下内容作为输入

begin mickey
<block of text>
end

begin mouse
<block of text>
end

begin miney
<block of text>
end

我如何解析文件输出：

file1: mickey.hostname (contain only mickey)
file2: mickey.cmds (contains the block of text)

file3: mouse.hostname
file4: mouse.cmds

file5: miney.hostname
file6: miney.cmds

谢谢你！

Answer 1

在bash：

no=1
state=search
while read line; do
  if [[ "$line" =~ "^start " ]] && [[ $state = search ]]; then
    name=${line#start }
    echo $name >file$no.txt
    no++
    state=text
    continue
  fi
  if [[ "$line" =~ "^end$" ]] && [[ $state = text]]; then
    no++
    state=search
    continue
  fi
  if [[ $state = text]]; then
    echo "$line" >>file$no.txt
    continue
  fi
done < orig.file

我在晨练时这样做，你有责任检查正确性。

Answer 2

[it for it in map(lambda j: j.split('\n'),                   # 3. split each item on the newline to separate the name and block of text
              map(lambda l: l[6:],                           # 2. remove 'begin' from each item in the list
              map(lambda x: x.strip(), input.split('end')))) # 1. split all input on the word 'end', then remove whitespace at the ends of each item in the list
              if it[0]]                                      # 4. remove blank lines if any

产生输出

[['mickey', '<block of text>'],
 ['mouse', '<block of text>'],
 ['miney', '<block of text>']]

应该很容易从那里处理

如何将带有标记块的文件拆分为Python或Bash中的许多文件？

2 个答案: