我的CSV文件结构如下:
# Samples 1
1,58
2,995
3,585
# Samples 2
15,87
16,952
17,256
# Samples 1
4,89
5,63
6,27
在Python 3.x中是否有任何方法,如何解析这样结构化的文件而无需逐行手动完成?
我想要一些功能,它会根据标签自动解析它,如下所示:
>> parseLabeledCSV(['# Samples 1', '# Samples 2'], fileName)
[{1:58,2:995,3:585,4:89,5:63,6:27}, {15:57, 16:952, 17:256}]
答案 0 :(得分:1)
这样的东西?
input="""# Samples 1
1,58
2,995
3,585
# Samples 2
15,87
16,952
17,256
# Samples 1
4,89
5,63
6,27"""
def parse(input):
parsed = {}
lines = input.split("\n")
key = "# Unknown"
for line in lines:
if line is None or line == "": # ignore empty line
continue
if line.startswith("#") :
if not parsed.has_key(line):
parsed[line] = {}
key = line
continue
left, right = line.split(",")
parsed[key][left] = right
return parsed
if __name__ == '__main__':
output = parse(input)
print(output)
将输出到:
{'# Samples 1': {'1': '58', '3': '585', '2': '995', '5': '63', '4': '89', '6': '27'}, '# Samples 2': {'15': '87', '17': '256', '16': '952'}}
答案 1 :(得分:0)
groupby将为您完成所有迭代和分组。在这种情况下,您只关心包含''的那些连续的线组。 (或仅由','和数字组成,或者您需要定义的其他任何过滤谓词):
input="""# Samples 1
1,58
2,995
3,585
# Samples 2
15,87
16,952
17,256
# Samples 1
4,89
5,63
6,27""".splitlines()
from itertools import groupby
import csv
results = []
for has_comma, data_lines in groupby(input, key=lambda s: ',' in s):
if has_comma:
results.append(dict(csv.reader(data_lines)))
这甚至可以折叠成一个Python列表理解语句:
results = [dict(csv.reader(data_lines))
for has_comma, data_lines in groupby(input, key=lambda s: ',' in s)
if has_comma]
在这两种情况下,使用以下方法打印结果:
for dd in results:
print(dd)
得到:
{'1': '58', '3': '585', '2': '995'}
{'15': '87', '17': '256', '16': '952'}
{'5': '63', '4': '89', '6': '27'}