Question

我目前正在关注＆＃34;以艰难的方式学习Python＆＃34;。但是，当我在.txt文件上使用.read（）命令时，它以非常奇怪的方式输出文本，带有额外的空格，并且在开头有一个正方形：

Extra spaces and squares.

控制台是Windows Powershell。

我的代码如下所示：

from sys import argv #imports argv from sys

script, filename = argv #unpacks script and filename from argv

txt = open(filename) #declares the variable txt as the text in filename

print "Here's your file %r" % filename #prints the string and the filename
print txt.read() #prints a reading of txt
txt.close()

print "Type the filename again:" #prints the string
file_again = raw_input("> ") #declares the variable file_again as the raw input

txt_again = open(file_again) #declares the variable txt_again as the text in file_again

print txt_again.read() #prints a reading of txt_again
txt.close()

文件看起来像这样：

This is stuff I typed into a file.
It is really cool stuff.
Lots and lots of fun to have in here.

请帮忙！

Answer 1

您的文件似乎使用2字节编码进行编码;大概是UTF-16。由于python无法猜测，它只是在获取字节时输出字节;对于仅ASCII文本，这意味着每个其他字符都是纯文本可读的。

Answer 2

如果您使用的是Python 2.7.x，则应该使用该ASCII字符串并执行：

text = txt.read().decode("utf-16")
print text

那应该以可读的方式输出文件。正如之前所指出的，该文件似乎是以UTF-16编码的，因此不应将其视为“读取文本文件的方式”。如果使用Notepad ++，则可以从“编码”菜单中选择文件编码。 Microsoft Notepad允许您在“另存为...”对话框中选择编码。

Answer 3

查看https://docs.python.org/2/howto/unicode.html

您的文件是Unicode，或者PowerShell正在使用编码做一些有趣的事情。上面的链接解释了如何在Python 2.x中打开Unicode文件 - 相关部分在这里：

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
  print repr(line)

在python中读取文本文件

3 个答案: