无法解码yml文件... utf8'编解码器不能解码字节#xa0:无效的起始字节

时间:2015-04-11 23:40:09

标签: python pyyaml

我试图阅读YAML文件并将其转换为字典文件。我在将文件加载到dict变量时发现了一个问题。

我试图搜索类似的问题。 stackoverflow中的一个回复是用'\\xa0'替换每个字符' '。我试过line = line.replace('\\xa0',' ')。这个程序不适用于Python 2.7版本。我尝试使用Python 3它运行正常。

import yaml
import sys

yaml_dir = "/root/tools/test_case/"

#file_name = "TC_CFD_SR.yml"
file_name = "TC_QB.yml"
tc_file_name = yaml_dir + file_name

def write(file,content):
    file = open(file,'a')
    file.write(content)
    file.close()

def verifyYmlFile(yml_file):
    data = {}
    with open(yml_file, 'r') as fin:
        for line in fin:
            line = line.replace('\\xa0',' ')
            write('anand-yaml.yml',line)

    with open('anand-yaml.yml','r') as fin:
        data = yaml.load(fin)
    return data

if __name__ == '__main__':
    data = {}
    print "verifying yaml"
    data= verifyYmlFile(tc_file_name)

错误:

[root@anand-harness test_case]# python verify_yaml.py 
verifying yaml
Traceback (most recent call last):
  File "verify_yaml.py", line 29, in <module>
    data= verifyYmlFile(tc_file_name)
  File "verify_yaml.py", line 23, in verifyYmlFile
    data = yaml.load(fin)
  File "/usr/lib64/python2.6/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/lib64/python2.6/site-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 111, in compose_sequence_node
    node.value.append(self.compose_node(node, index))
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 64, in compose_node
    if self.check_event(AliasEvent):
  File "/usr/lib64/python2.6/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/lib64/python2.6/site-packages/yaml/parser.py", line 449, in parse_block_mapping_value
    if not self.check_token(KeyToken, ValueToken, BlockEndToken):
  File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 244, in fetch_more_tokens
    return self.fetch_single()
  File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 653, in fetch_single
    self.fetch_flow_scalar(style='\'')
  File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 667, in fetch_flow_scalar
    self.tokens.append(self.scan_flow_scalar(style))
  File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 1156, in scan_flow_scalar
    chunks.extend(self.scan_flow_scalar_non_spaces(double, start_mark))
  File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 1196, in scan_flow_scalar_non_spaces
    while self.peek(length) not in u'\'\"\\\0 \t\r\n\x85\u2028\u2029':
  File "/usr/lib64/python2.6/site-packages/yaml/reader.py", line 91, in peek
    self.update(index+1)
  File "/usr/lib64/python2.6/site-packages/yaml/reader.py", line 165, in update
    exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #xa0: invalid start byte
  in "anand-yaml.yml", position 3246

我错过了什么?

1 个答案:

答案 0 :(得分:0)

字符序列“\\xa0”不是您在邮件中看到的问题,问题是序列“\xa0”(请注意反斜杠未转义)。
你的替换线应该是:

     line = line.replace('\xa0',' ')

以规避问题。

如果您知道格式是什么,您可以自己进行正确的转换,但这不是必需的,或者上述修补不是结构性解决方案。最好是以正确的方式生成YAML文件(它们默认为UTF-8,因此它应包含正确的UTF-8)。如果没有相应的BOM(yaml库解释IIRC),它可能是UTF-16。

s1 = 'abc\\xa0xyz'
print(repr(s1))
u1 = s1.decode('utf-8') # this works fine

s = 'abc\xa0xyz'
print(repr(s))
u = s.decode('utf-8') # this throws an error