使用Python解析非同类CSV文件

时间:2016-06-23 17:23:20

标签: python python-3.x csv parsing

我的CSV文件结构如下:

# Samples 1
1,58
2,995
3,585

# Samples 2
15,87
16,952
17,256

# Samples 1
4,89
5,63
6,27

在Python 3.x中是否有任何方法,如何解析这样结构化的文件而无需逐行手动完成?

我想要一些功能,它会根据标签自动解析它,如下所示:

>> parseLabeledCSV(['# Samples 1', '# Samples 2'], fileName)
[{1:58,2:995,3:585,4:89,5:63,6:27}, {15:57, 16:952, 17:256}]

2 个答案:

答案 0 :(得分:1)

这样的东西?

input="""# Samples 1
1,58
2,995
3,585

# Samples 2
15,87
16,952
17,256

# Samples 1
4,89
5,63
6,27"""


def parse(input):
    parsed = {}
    lines = input.split("\n")
    key = "# Unknown"
    for line in lines:
        if line is None or line == "": #  ignore empty line
            continue
        if line.startswith("#") :
            if not parsed.has_key(line):
                parsed[line] = {}
            key = line
            continue
        left, right = line.split(",")
        parsed[key][left] = right
    return parsed


if __name__ == '__main__':
    output = parse(input)
    print(output)

将输出到:

{'# Samples 1': {'1': '58', '3': '585', '2': '995', '5': '63', '4': '89', '6': '27'}, '# Samples 2': {'15': '87', '17': '256', '16': '952'}}

答案 1 :(得分:0)

groupby将为您完成所有迭代和分组。在这种情况下,您只关心包含''的那些连续的线组。 (或仅由','和数字组成,或者您需要定义的其他任何过滤谓词):

input="""# Samples 1
1,58
2,995
3,585

# Samples 2
15,87
16,952
17,256

# Samples 1
4,89
5,63
6,27""".splitlines()

from itertools import groupby
import csv

results = []
for has_comma, data_lines in groupby(input, key=lambda s: ',' in s):
    if has_comma:
        results.append(dict(csv.reader(data_lines)))

这甚至可以折叠成一个Python列表理解语句:

results = [dict(csv.reader(data_lines)) 
            for has_comma, data_lines in groupby(input, key=lambda s: ',' in s) 
                if has_comma]

在这两种情况下,使用以下方法打印结果:

for dd in results:
    print(dd)

得到:

{'1': '58', '3': '585', '2': '995'}
{'15': '87', '17': '256', '16': '952'}
{'5': '63', '4': '89', '6': '27'}