Question

在支持遗留系统时，我面临着一个现场数据收集器，它以下列格式存储数据：

# This is a comment <-beacuse it starts at the begining of the file
# This is a comment <- see above
# 1. Item one <- not a comment because it starts with 1.
# Description of Item 1 <- not a comment as it is after a line that starts with a number
data point 1
data point 2
data point etc
3 <-- represents number of data points under Item one

# 2. Item two <-- not a comment
# Description of item 2 <-- not a comment
data point 1
data point ..
data point 100
100
#3. Item three <--- not a comment
# Item three description
0

我不确定解析该文件的正确方法是将每个Item包含在自己的列表中。请注意，有时但并非总是数据会在两个不同的项目之间添加随机空间。

解析此类文件的正确方法是什么？

Answer 1

我会分三步完成：

删除文件开头的所有评论
拆分正则表达式以查找文件中的所有其他注释（有关如何使用正则表达式进行拆分的示例，请参阅here）
解析剩下的行

Answer 2

您可以使用REGEX并执行split：^(?=\# ?\d+\.)

此处解释示例：http://regex101.com/r/gB3xD1

使用python解析具有异常分隔符的文本文件

2 个答案: