我需要从文件中逐行读取。还需要确保正确处理编码。
我写了以下代码:
#!/bin/bash
import codecs
filename = "something.x10"
f = open(filename, 'r')
fEncoded = codecs.getreader("ISO-8859-15")(f)
totalLength = 0
for line in fEncoded:
totalLength+=len(line)
print("Total Length is "+totalLength)
此代码不适用于所有文件,我得到的某些文件
Traceback (most recent call last):
File "test.py", line 11, in <module>
for line in fEncoded:
File "/usr/lib/python3.2/codecs.py", line 623, in __next__
line = self.readline()
File "/usr/lib/python3.2/codecs.py", line 536, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib/python3.2/codecs.py", line 480, in read
data = self.bytebuffer + newdata
TypeError: can't concat bytes to str
我正在使用python 3.3,脚本必须使用这个python版本。
我做错了什么,我无法找出哪些文件有效,哪些文件无效,甚至一些普通的ASCII文件都失败了。
答案 0 :(得分:2)
您正在以非二进制模式打开文件。如果您从中读取,则会根据您的默认编码(http://docs.python.org/3/library/functions.html?highlight=open%20builtin#open)获得解码字符串。
编解码器的StreamReader需要一个字节流(http://docs.python.org/3/library/codecs#codecs.StreamReader)
所以这应该有效:
import codecs
filename = "something.x10"
f = open(filename, 'rb')
f_decoded = codecs.getreader("ISO-8859-15")(f)
totalLength = 0
for line in f_decoded:
total_length += len(line)
print("Total Length is "+total_length)
或者您可以使用open
上的编码参数:
f_decoded = open(filename, mode='r', encoding='ISO-8859-15')
读者返回解码数据,因此我修改了变量名称。另外,请考虑pep8作为格式和编码风格的指南。