Question

我正在尝试在python中编写一个函数来打开文件并将其解析为字典。我正在尝试使列表block中的第一项成为字典data中每个项目的键。然后每个项目应该是列表的其余部分block减去第一个项目。但是出于某种原因，当我运行以下函数时，它会错误地解析它。我在下面提供了输出。我怎么能像上面说的那样解析它？任何帮助将不胜感激。

功能：

def parseData() :
    filename="testdata.txt"
    file=open(filename,"r+")

    block=[]
    for line in file:
        block.append(line)
        if line in ('\n', '\r\n'):
            album=block.pop(1)
            data[block[1]]=album
            block=[]
    print data

输入：

Bob Dylan
1966 Blonde on Blonde
-Rainy Day Women #12 & 35
-Pledging My Time
-Visions of Johanna
-One of Us Must Know (Sooner or Later)
-I Want You
-Stuck Inside of Mobile with the Memphis Blues Again
-Leopard-Skin Pill-Box Hat
-Just Like a Woman
-Most Likely You Go Your Way (And I'll Go Mine)
-Temporary Like Achilles
-Absolutely Sweet Marie
-4th Time Around
-Obviously 5 Believers
-Sad Eyed Lady of the Lowlands

输出：

{'-Rainy Day Women #12 & 35\n': '1966 Blonde on Blonde\n',
 '-Whole Lotta Love\n': '1969 II\n', '-In the Evening\n': '1979 In Through the Outdoor\n'}

Answer 1

您可以使用groupby使用空行作为分隔符对数据进行分组，对于重复键使用defaultdict，在提取键/首先从groupby返回的每个val中扩展其余值元件。

from itertools import groupby
from collections import defaultdict
d = defaultdict(list)
with open("file.txt") as f:
    for k, val in groupby(f, lambda x: x.strip() != ""):
        # if k is True we have a section
       if k:
            # get key  "k" which is the first line
           # from each section, val will be the remaining lines
           k,*v = val
           # add or add to the existing key/value pairing
           d[k].extend(map(str.rstrip,v))
from pprint import pprint as pp
pp(d)

输出：

{'Bob Dylan\n': ['1966 Blonde on Blonde',
                 '-Rainy Day Women #12 & 35',
                 '-Pledging My Time',
                 '-Visions of Johanna',
                 '-One of Us Must Know (Sooner or Later)',
                 '-I Want You',
                 '-Stuck Inside of Mobile with the Memphis Blues Again',
                 '-Leopard-Skin Pill-Box Hat',
                 '-Just Like a Woman',
                 "-Most Likely You Go Your Way (And I'll Go Mine)",
                 '-Temporary Like Achilles',
                 '-Absolutely Sweet Marie',
                 '-4th Time Around',
                 '-Obviously 5 Believers',
                 '-Sad Eyed Lady of the Lowlands'],
 'Led Zeppelin\n': ['1979 In Through the Outdoor',
                    '-In the Evening',
                    '-South Bound Saurez',
                    '-Fool in the Rain',
                    '-Hot Dog',
                    '-Carouselambra',
                    '-All My Love',
                    "-I'm Gonna Crawl",
                    '1969 II',
                    '-Whole Lotta Love',
                    '-What Is and What Should Never Be',
                    '-The Lemon Song',
                    '-Thank You',
                    '-Heartbreaker',
                    "-Living Loving Maid (She's Just a Woman)",
                    '-Ramble On',
                    '-Moby Dick',
                    '-Bring It on Home']}

对于python2，unpack语法略有不同：

with open("file.txt") as f:
    for k, val in groupby(f, lambda x: x.strip() != ""):
        if k:
            k, v = next(val), val
            d[k].extend(map(str.rstrip, v))

如果您想保留换行符，请删除map(str.rstrip..

如果您想为每位艺术家分别制作专辑和歌曲：

from itertools import groupby
from collections import defaultdict

d = defaultdict(lambda: defaultdict(list))
with open("file.txt") as f:
    for k, val in groupby(f, lambda x: x.strip() != ""):
        if k:
            k, alb, songs = next(val),next(val), val
            d[k.rstrip()][alb.rstrip()] = list(map(str.rstrip, songs))

from pprint import pprint as pp

pp(d)



{'Bob Dylan': {'1966 Blonde on Blonde': ['-Rainy Day Women #12 & 35',
                                         '-Pledging My Time',
                                         '-Visions of Johanna',
                                         '-One of Us Must Know (Sooner or '
                                         'Later)',
                                         '-I Want You',
                                         '-Stuck Inside of Mobile with the '
                                         'Memphis Blues Again',
                                         '-Leopard-Skin Pill-Box Hat',
                                         '-Just Like a Woman',
                                         '-Most Likely You Go Your Way '
                                         "(And I'll Go Mine)",
                                         '-Temporary Like Achilles',
                                         '-Absolutely Sweet Marie',
                                         '-4th Time Around',
                                         '-Obviously 5 Believers',
                                         '-Sad Eyed Lady of the Lowlands']},
 'Led Zeppelin': {'1969 II': ['-Whole Lotta Love',
                              '-What Is and What Should Never Be',
                              '-The Lemon Song',
                              '-Thank You',
                              '-Heartbreaker',
                              "-Living Loving Maid (She's Just a Woman)",
                              '-Ramble On',
                              '-Moby Dick',
                              '-Bring It on Home'],
                  '1979 In Through the Outdoor': ['-In the Evening',
                                                  '-South Bound Saurez',
                                                  '-Fool in the Rain',
                                                  '-Hot Dog',
                                                  '-Carouselambra',
                                                  '-All My Love',
                                                  "-I'm Gonna Crawl"]}}

Answer 2

我想这就是你想要的？

即使这不是您想要的格式，您也可以从答案中学到一些东西：

使用with进行文件处理
很高兴有：
- PEP8代码，请参阅http://pep8online.com/
- a shebang
- numpydoc
- if __name__ == '__main__'

SE不喜欢代码继续列表...

#!/usr/bin/env python

""""Parse text files with songs, grouped by album and artist."""


def add_to_data(data, block):
    """
    Parameters
    ----------
    data : dict
    block : list

    Returns
    -------
    dict
    """
    artist = block[0]
    album = block[1]
    songs = block[2:]
    if artist in data:
        data[artist][album] = songs
    else:
        data[artist] = {album: songs}
    return data


def parseData(filename='testdata.txt'):
    """
    Parameters
    ----------
    filename : string
        Path to a text file.

    Returns
    -------
    dict
    """
    data = {}
    with open(filename) as f:
        block = []
        for line in f:
            line = line.strip()
            if line == '':
                data = add_to_data(data, block)
                block = []
            else:
                block.append(line)
        data = add_to_data(data, block)
    return data

if __name__ == '__main__':
    data = parseData()
    import pprint
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(data)

给出：

{   'Bob Dylan': {   '1966 Blonde on Blonde': [   '-Rainy Day Women #12 & 35',
                                                  '-Pledging My Time',
                                                  '-Visions of Johanna',
                                                  '-One of Us Must Know (Sooner or Later)',
                                                  '-I Want You',
                                                  '-Stuck Inside of Mobile with the Memphis Blues Again',
                                                  '-Leopard-Skin Pill-Box Hat',
                                                  '-Just Like a Woman',
                                                  "-Most Likely You Go Your Way (And I'll Go Mine)",
                                                  '-Temporary Like Achilles',
                                                  '-Absolutely Sweet Marie',
                                                  '-4th Time Around',
                                                  '-Obviously 5 Believers',
                                                  '-Sad Eyed Lady of the Lowlands']},
    'Led Zeppelin': {   '1969 II': [   '-Whole Lotta Love',
                                       '-What Is and What Should Never Be',
                                       '-The Lemon Song',
                                       '-Thank You',
                                       '-Heartbreaker',
                                       "-Living Loving Maid (She's Just a Woman)",
                                       '-Ramble On',
                                       '-Moby Dick',
                                       '-Bring It on Home'],
                        '1979 In Through the Outdoor': [   '-In the Evening',
                                                           '-South Bound Saurez',
                                                           '-Fool in the Rain',
                                                           '-Hot Dog',
                                                           '-Carouselambra',
                                                           '-All My Love',
                                                           "-I'm Gonna Crawl"]}}

将项目附加到字典Python

2 个答案: