Question

我需要从文件中逐行读取。还需要确保正确处理编码。

我写了以下代码：

#!/bin/bash

import codecs

filename = "something.x10"

f = open(filename, 'r')
fEncoded = codecs.getreader("ISO-8859-15")(f)

totalLength = 0
for line in fEncoded:
  totalLength+=len(line)

print("Total Length is "+totalLength)

此代码不适用于所有文件，我得到的某些文件

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    for line in fEncoded:
  File "/usr/lib/python3.2/codecs.py", line 623, in __next__
    line = self.readline()
  File "/usr/lib/python3.2/codecs.py", line 536, in readline
    data = self.read(readsize, firstline=True)
  File "/usr/lib/python3.2/codecs.py", line 480, in read
    data = self.bytebuffer + newdata
TypeError: can't concat bytes to str

我正在使用python 3.3，脚本必须使用这个python版本。

我做错了什么，我无法找出哪些文件有效，哪些文件无效，甚至一些普通的ASCII文件都失败了。

Answer 1

您正在以非二进制模式打开文件。如果您从中读取，则会根据您的默认编码（http://docs.python.org/3/library/functions.html?highlight=open%20builtin#open）获得解码字符串。

编解码器的StreamReader需要一个字节流（http://docs.python.org/3/library/codecs#codecs.StreamReader）

所以这应该有效：

import codecs

filename = "something.x10"

f = open(filename, 'rb')
f_decoded = codecs.getreader("ISO-8859-15")(f)

totalLength = 0
for line in f_decoded:
   total_length += len(line)

print("Total Length is "+total_length)

或者您可以使用open上的编码参数：

 f_decoded = open(filename, mode='r', encoding='ISO-8859-15')

读者返回解码数据，因此我修改了变量名称。另外，请考虑pep8作为格式和编码风格的指南。

使用编解码器以正确的编码读取文件：TypeError

1 个答案: