我无法弄清楚为什么我的yaml.load正在爆炸

时间:2017-03-22 07:14:02

标签: python yaml

我有这个python脚本,我从librivox.org网站上下载了一些文本。我试着保存"描述" yaml和json的有声读物。我试图这样做的方法是生成我的yaml并使用python将其转换为python。我遇到的问题是这一行...

    myyaml = yaml.load(yaml_version)

...因跟踪输出而失败......

Traceback (most recent call last):
  File "./test-get-description.py", line 143, in <module>
    main(sys.argv[1:])
  File "./test-get-description.py", line 136, in main
    myyaml = yaml.load(yaml_version)
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 439, in parse_block_mapping_key
    "expected <block end>, but found %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a block mapping
  in "<unicode string>", line 2, column 1:
    amazon_app_id: 'junk'
    ^
expected <block end>, but found '<scalar>'
  in "<unicode string>", line 11, column 2:
     x
     ^

这是脚本:

#!/usr/bin/env python

import sys, getopt
import json
import yaml
import requests
import subprocess
import re

hiera_dir               = '/home/hiera/audiobooks'

from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()


def usage(msg):
        print msg


def write_file( data, fn ):
        print "Writing output to %s\n" % (fn)
        with open(fn, "w") as fh:
                fh.write(data)

def main(argv):
        global top
        global version
        global package
        appname                 = 'unknown'
        librivox_id             = 'unknown'
        app_image_url           = 'unknown'
        email                   = 'unknown'
        acctpasswd              = 'unknown'
        password                = 'XXXXXXX'
        try:
              opts, args = getopt.getopt(argv,"hn:l:t:v:k:p:i:e:P:",["appname", "id=","top=","version=","package=","password=","image_url=","email=","acctpasswd="])
        except getopt.GetoptError:
              print 'make_hiera_data_from_librivox_api.py -n <appname> -l <librvox id> -e <developer email> -P <developer passwd> [-t <top>] [-v <version>] [-p <password>]'
              sys.exit(2)
        for opt, arg in opts:
                if opt == '-h':
                        usage ( 'Help called' )
                        sys.exit(0)
                elif opt in ("-n", "--appname"):
                        appname = arg
                elif opt in ("-l", "--id"):
                        librivox_id = arg
                elif opt in ("-t", "--top"):
                        top = arg
                elif opt in ("-v", "--version"):
                        version = arg
                elif opt in ("-p", "--password"):
                        password = arg
                elif opt in ("-k", "--package"):
                        password = arg
                elif opt in ("-i", "--image_url"):
                        app_image_url = arg
                elif opt in ("-e", "--email"):
                        email = arg
                elif opt in ("-P", "--acctpasswd"):
                        acctpasswd = arg

        if ( appname == 'unknown' ):
                usage ("Please specify a appname")
                sys.exit (1)
        if ( librivox_id == 'unknown' ):
                usage ("Please specify a librivox api id")
                sys.exit (1)

# https://librivox.org/api/feed/audiobooks/id/9485/extended/1/format/json
        librivox_rest_url       = "https://librivox.org/api/feed/audiobooks/id/" + librivox_id + "/extended/1/format/json"
        try:
                parsed          = json.loads(requests.get(librivox_rest_url).text)
        except:
                e = sys.exc_info()[0]
                print "Error on %s Error [%s]" % ( librivox_rest_url, e )
                sys.exit(1)

        try:
                book_key        = parsed['books'].keys()[0]
        except:
                e = sys.exc_info()[0]
                print "Error on %s Error [%s]" % ( librivox_rest_url, e )
                sys.exit(1)
        apptitle        = parsed['books'][book_key]['title']
        app_zip_url     = parsed['books'][book_key]['url_zip_file']
        description     = parsed['books'][book_key]['description']
        description     = strip_tags(parsed['books'][book_key]['description'].encode('ascii', 'ignore').decode('ascii'))

        description     = re.sub("^"," ", description, flags=re.MULTILINE)
        description     = re.sub("^$"," X", description, flags=re.MULTILINE)
        description     = re.sub("^ $"," x", description, flags=re.MULTILINE)
        for d in description.split("\n"):
                print "d is [%s]\n" % (d)

        amazon_app_id   = 'junk'
        top             = 'junk'
        package         = 'junk'
        version         = 'junk'
        password        = 'junk'
        yaml_version = """---
amazon_app_id: '%s'
librivox_rest_url: '%s'
librivox_id: '%s'
top: '%s'
package: '%s'
version: '%s'
password: '%s'
description: |
 %s

""" % (
                                                        amazon_app_id
                                                ,       librivox_rest_url
                                                ,       librivox_id
                                                ,       top
                                                ,       package
                                                ,       version
                                                ,       password
                                                ,       description )
        print yaml_version
        write_file( yaml_version, hiera_dir + '/' + appname + '.yaml' );
        myyaml = yaml.load(yaml_version)
        json_version = json.dumps( yaml.load(yaml_version), sort_keys=True, indent=2)
        print json_version

        write_file( json_version, doc_root_audiobook_json + '/' + appname + '.json' );

if __name__ == "__main__":
   main(sys.argv[1:])

我像这样运行脚本:

[red@localhost scripts]$ ./test-get-description.py -n 'junk' -l 3269

ID 3269对此网址采取一个:

https://librivox.org/api/feed/audiobooks/id/3269/extended/1/format/json

我写的yaml文件如下所示:

---
amazon_app_id: 'junk'
librivox_rest_url: 'https://librivox.org/api/feed/audiobooks/id/3269/extended/1/format/json'
librivox_id: '3269'
top: 'junk'
package: 'junk'
version: 'junk'
password: 'junk'
description: |
  It is the end of the 19th century. Like thousands of others, the Rudkus family has emigrated from Lithuania to America in search of a better life. As they settle into the Packingtown neighborhood of Chicago, they find their dreams are unlikely to be realized. In fact, just the opposite is quite likely to occur. Jurgis, the main character of the novel, has brought his father Antanas, his fiance Ona, her stepmother Teta Elzbieta, Teta Elzbieta's brother Jonas and her six children, and Ona's cousin Marija Berczynskas along. The family, nave to the ways of Chicago, quickly falls prey to con men and makes a series of bad decisions that lead them into wretched poverty and terrible living conditions. All are forced to find jobs in dismal working conditions for their very survival. Jurgis, broken and discouraged, eventually finds solace in the American Socialist movement.
 x
 This novel was written during a period in American history when Trusts were formed by multiple corporations to establish monopolies that stifled competition and fixed prices. Unthinkable working conditions and unfair business practices were the norm. The Jungles author, Upton Sinclair, was an ardent Socialist of the time. Sinclair was commissioned by the Appeal To Reason, a Socialist journal of the period, to write a fictional expose on the working conditions of the immigrant laborers in the meat packing industry in Chicago. Going undercover, Sinclair spent seven weeks inside the meatpacking plants gathering details for his novel.
 x
 The Reader wishes to gratefully acknowledge the assistance, and patience, of Professor Giedrius Subacius (University of Illinois) and the folks at Lituanus for their invaluable support as I struggled with Lithuanian pronunciations. Truly, this audio book would have been far more difficult, and far less authentic, without their help.
 x
 And now, feel free to wander into The Jungle.
 x
 (Summary by Tom Weiss)

2 个答案:

答案 0 :(得分:1)

你有一个文字块的缩进问题,它的所有行都必须保持至少相同的缩进级别。

  

在YAML块样式中,结构由缩进确定。在   通常,缩进被定义为零个或多个空格字符   一行的开始。

所以这会奏效:

description: |
  It is the end of the 19th century. Like thousands of others, the Rudkus family has emigrated from Lithuania to America in search of a better life. As they settle into the Packingtown neighborhood of Chicago, they find their dreams are unlikely to be realized. In fact, just the opposite is quite likely to occur. Jurgis, the main character of the novel, has brought his father Antanas, his fiance Ona, her stepmother Teta Elzbieta, Teta Elzbieta's brother Jonas and her six children, and Ona's cousin Marija Berczynskas along. The family, nave to the ways of Chicago, quickly falls prey to con men and makes a series of bad decisions that lead them into wretched poverty and terrible living conditions. All are forced to find jobs in dismal working conditions for their very survival. Jurgis, broken and discouraged, eventually finds solace in the American Socialist movement.
  x
  This novel was written during a period in American history when Trusts were formed by multiple corporations to establish monopolies that stifled competition and fixed prices. Unthinkable working conditions and unfair business practices were the norm. The Jungles author, Upton Sinclair, was an ardent Socialist of the time. Sinclair was commissioned by the Appeal To Reason, a Socialist journal of the period, to write a fictional expose on the working conditions of the immigrant laborers in the meat packing industry in Chicago. Going undercover, Sinclair spent seven weeks inside the meatpacking plants gathering details for his novel.
  x
  The Reader wishes to gratefully acknowledge the assistance, and patience, of Professor Giedrius Subacius (University of Illinois) and the folks at Lituanus for their invaluable support as I struggled with Lithuanian pronunciations. Truly, this audio book would have been far more difficult, and far less authentic, without their help.
  x
  And now, feel free to wander into The Jungle.
  x
  (Summary by Tom Weiss)

答案 1 :(得分:1)

问题在于你的文字标量。因为你没有给indentation explicitly the indent is determined from the first non-empty line。在你的情况下,这是2.由于其他一些行的缩进比第一行少,你必须明确指定缩进:

description: |1
  It is the end .....

你的线条不必对齐。

除非您是100%,否则您永远不会从不受控制的来源阅读YAML,因此您不应该使用.load(),因为它不安全。请改用safe_load()