如何从Python中的文件一次读取单个字符?

时间:2010-06-07 09:11:31

标签: python file-io character

谁能告诉我怎么办?

14 个答案:

答案 0 :(得分:76)

with open(filename) as f:
  while True:
    c = f.read(1)
    if not c:
      print "End of file"
      break
    print "Read a character:", c

答案 1 :(得分:32)

首先打开一个文件:

with open("filename") as fileobj:
    for line in fileobj:  
       for ch in line: 
           print ch

答案 2 :(得分:14)

我喜欢接受的答案:它很简单,可以完成工作。我还想提供另一种实现方式:

def chunks(filename, buffer_size=4096):
    """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
    until no more characters can be read; the last chunk will most likely have
    less than `buffer_size` bytes.

    :param str filename: Path to the file
    :param int buffer_size: Buffer size, in bytes (default is 4096)
    :return: Yields chunks of `buffer_size` size until exhausting the file
    :rtype: str

    """
    with open(filename, "rb") as fp:
        chunk = fp.read(buffer_size)
        while chunk:
            yield chunk
            chunk = fp.read(buffer_size)

def chars(filename, buffersize=4096):
    """Yields the contents of file `filename` character-by-character. Warning:
    will only work for encodings where one character is encoded as one byte.

    :param str filename: Path to the file
    :param int buffer_size: Buffer size for the underlying chunks,
    in bytes (default is 4096)
    :return: Yields the contents of `filename` character-by-character.
    :rtype: char

    """
    for chunk in chunks(filename, buffersize):
        for char in chunk:
            yield char

def main(buffersize, filenames):
    """Reads several files character by character and redirects their contents
    to `/dev/null`.

    """
    for filename in filenames:
        with open("/dev/null", "wb") as fp:
            for char in chars(filename, buffersize):
                fp.write(char)

if __name__ == "__main__":
    # Try reading several files varying the buffer size
    import sys
    buffersize = int(sys.argv[1])
    filenames  = sys.argv[2:]
    sys.exit(main(buffersize, filenames))

我建议的代码与您接受的答案基本相同:从文件中读取给定的字节数。不同之处在于它首先读取了大量数据(4006是X86的一个很好的默认值,但你可能想尝试1024或8192;你的页面大小的任何倍数),然后它产生那个块中的字符一个一个人。

我提供的代码对于较大的文件可能更快。举个例子,the entire text of War and Peace, by Tolstoy。这些是我的计时结果(使用OS X 10.7.4的Mac Book Pro; so.py是我给我粘贴的代码的名称):

$ time python so.py 1 2600.txt.utf-8
python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total
$ time python so.py 4096 2600.txt.utf-8
python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total

现在:不要将4096的缓冲区大小作为普遍真理;看看我得到的不同大小的结果(缓冲区大小(字节)与墙上时间(秒)):

   2 2.726 
   4 1.948 
   8 1.693 
  16 1.534 
  32 1.525 
  64 1.398 
 128 1.432 
 256 1.377 
 512 1.347 
1024 1.442 
2048 1.316 
4096 1.318 

正如你所看到的,你可以早些开始看到收益(我的时间可能非常不准确);缓冲区大小是性能和内存之间的权衡。默认值4096只是一个合理的选择,但与往常一样,先测量。

答案 3 :(得分:8)

Python本身可以在交互模式下为您提供帮助:

>>> help(file.read)
Help on method_descriptor:

read(...)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.

答案 4 :(得分:5)

只需:

myfile = open(filename)
onecaracter = myfile.read(1)

答案 5 :(得分:4)

今天我在观看Raymond Hettinger的Transforming Code into Beautiful, Idiomatic Python时学到了一个新的习语:

import functools

with open(filename) as f:
    f_read_ch = functools.partial(f.read, 1)
    for ch in iter(f_read_ch, ''):
        print 'Read a character:', repr(ch) 

答案 6 :(得分:2)

你应该尝试f.read(1),这绝对是正确的,也是正确的。

答案 7 :(得分:2)

只读一个字符

f.read(1)

答案 8 :(得分:0)

f = open('hi.txt', 'w')
f.write('0123456789abcdef')
f.close()
f = open('hej.txt', 'r')
f.seek(12)
print f.read(1) # This will read just "c"

答案 9 :(得分:0)

这也有效:

with open("filename") as fileObj:
    for line in fileObj:  
        for ch in line:
            print(ch)

它遍历文件中的每一行以及每行中的每个字符。

答案 10 :(得分:0)

要补充, 如果您正在读取包含vvvvery巨大的行的文件,这可能会破坏您的记忆,您可以考虑将它们读入缓冲区然后产生每个字符

def read_char(inputfile, buffersize=10240):
    with open(inputfile, 'r') as f:
        while True:
            buf = f.read(buffersize)
            if not buf:
                break
            for char in buf:
                yield char
        yield '' #handle the scene that the file is empty

if __name__ == "__main__":
    for word in read_char('./very_large_file.txt'):
        process(char)

答案 11 :(得分:0)

#reading out the file at once in a list and then printing one-by-one
f=open('file.txt')
for i in list(f.read()):
    print(i)

答案 12 :(得分:0)

next

答案 13 :(得分:0)

Python 3.8+ 的最佳答案:

with open(path, encoding="utf-8") as f:
    while c := f.read(1):
        do_my_thing(c)

您可能希望指定 utf-8 并避免使用平台编码。我选择在这里这样做。

函数 – Python 3.8+:

def stream_file_chars(path: str):
    with open(path) as f:
        while c := f.read(1):
            yield c

函数 – Python<=3.7:

def stream_file_chars(path: str):
    with open(path, encoding="utf-8") as f:
        while True:
            c = f.read(1)
            if c == "":
                break
            yield c

功能 – 路径库 + 文档

from pathlib import Path
from typing import Union, Generator

def stream_file_chars(path: Union[str, Path]) -> Generator[str, None, None]:
    """Streams characters from a file."""
    with Path(path).open(encoding="utf-8") as f:
        while (c := f.read(1)) != "":
            yield c