Question

我尝试连接txt文件，几乎一切顺利但是 out文件在每个字母之间有一个空格比如l o r e m i p s u m

这是我的代码

import glob

all = open("all.txt","a");

for f in glob.glob("*.txt"):
    print f
    t = open(f, "r")
    all.write(t.read())
    t.close()

all.close()

我正在使用Windows 7，python 2.7

修改
也许有更好的方法来连接文件？

EDIT2
我现在解决了问题：

Traceback (most recent call last):
  File "P:\bwiki\BWiki\MobileNotes\export\999.py", line 9, in <module>
    all.write( t.read())
  File "C:\Python27\lib\codecs.py", line 671, in read
    return self.reader.read(size)
  File "C:\Python27\lib\codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 18: invalid
continuation byte


import codecs
import glob

all =codecs.open("all.txt", "a", encoding="utf-8")

for f in glob.glob("*.txt"):
    print f
    t = codecs.open(f, "r", encoding="utf-8")
    all.write( t.read())

Answer 1

您的输入文件可能是UTF编码的，但您将其读取为ASCII，这会导致出现空格（反映空字节）。尝试：

import codecs

...

for f in glob.glob("*.txt"):
    print f
    t = codecs.open(f, "r", encoding="utf-16")

Answer 2

请运行此程序并将输出编辑到您的问题中（我们可能只需要查看输出的前五行，或者左右）。它以十六进制打印每个文件的前16个字节。这将有助于我们弄清楚发生了什么。

import glob
import sys

def hexdump(s):
    return " ".join("{:02x}".format(ord(c)) for c in s)

l = 0
for f in glob.glob("*.txt"):
    l = max(l, len(f))

for f in glob.glob("*.txt"):
    with open(f, "rb") as fp:
       sys.stdout.write("{0:<{1}}  {2}\n".format(f, l, hexdump(fp.read(16))))

Answer 3

＆＃34;空间＆＃34;字母之间可能表示至少有一些文件使用utf-16编码。

如果所有文件都使用相同的字符编码，那么您可以使用将文件复制为字节<{em>（cat(1) command）的code example in Python 3。这里的cat PowerShell command对应于您的Python代码：

PS C:\> Get-Content *.txt | Add-Content all.txt

与cat *.txt >> all.txt不同; It should not corrupt the character encoding.

如果您使用二进制文件模式，您的代码应该有效：

from glob import glob from shutil import copyfileobj with open('all.txt', 'ab') as output_file: for filename in glob("*.txt"): with open(filename, 'rb') as file: copyfileobj(file, output_file)

同样，所有文件都应该具有相同的字符编码，否则您可能会在输出中获得垃圾（混合内容）。

连接文本文件会创建每个字母之间有空格的文件

3 个答案: