Question

with open(sourceFileName, 'rt') as sourceFile:
    sourceFileConents = sourceFile.read()
    sourceFileConentsLength = len(sourceFileConents)

    i = 0
    while i < sourceFileConentsLength:
        print(str(i) + ' ' + sourceFileConents[i])
        i += 1

请原谅unPythonic for i循环，这只是测试代码＆amp;有理由在真实的代码中这样做。

Anyhoo，真正的代码似乎比预期的更早地结束了循环，所以我把上面的假人搞砸了，这删除了真实代码的所有逻辑。

sourceFileConentsLength报告为13,690，但是当我打印出char为char时，文件中仍然有几个字符，而不是打印出来的。

是什么给出的？

我应该使用<fileHandle>.read()之外的其他内容将文件的全部内容整合到一个字符串中吗？
我是否达到了最大字符串长度？如果是这样，我可以绕过它吗？
如果文件是在Windows＆amp ;;中编辑的，那么它可能是行结尾该脚本在Linux中运行（抱歉，我无法发布该文件，它对公司保密）
还有什么？

[更新]我认为我们会抓住其中两个想法。

有关字符串的最大长度，请参阅this question。

我对临时目录做了ls -lAF。只有6k +字符，但脚本交给它就好了。我应该担心线路结束吗？如果是这样，我该怎么办呢？源文件往往在Windows和Windows下进行编辑。 Linux，但脚本只能在Linux下运行。

[Updfate ++]我在输入文件中将行结尾更改为Eclipse中的Linux，但仍然得到了相同的结果。

Answer 1

如果您在文字模式下阅读文件，它会自动将\r\n等行结尾转换为\n。

尝试使用

with open(sourceFileName, newline='') as sourceFile:

代替;这将关闭换行符（\r\n将返回\r\n）。

Answer 2

如果您的文件编码为UTF-8，则应在计算字符前对其进行解码：

sourceFileContents_utf8 = open(sourceFileName, 'r+').read()
sourceFileContents_unicode = sourceFileContents_utf8.decode('utf8')
print(len(sourceFileContents_unicode))

i = 0
source_file_contents_length = len(sourceFileContents_unicode)
while i < source_file_contents_length:
    print('%s %s' % (str(i), sourceFileContents[i]))
    i += 1

文件内容不像预期的那样长

2 个答案: