检查Python中打开的文件

时间:2010-01-07 20:57:44

标签: python debugging exception file

我在程序中遇到错误,该程序应该运行很长时间,打开太多文件。有什么方法可以跟踪哪些文件是打开的,这样我偶尔可以打印出这个列表,看看问题出在哪里?

11 个答案:

答案 0 :(得分:37)

我最终将内置文件对象包装在程序的入口点。我发现我没有关闭我的记录器。

import io
import sys
import builtins
import traceback
from functools import wraps


def opener(old_open):
    @wraps(old_open)
    def tracking_open(*args, **kw):
        file = old_open(*args, **kw)

        old_close = file.close
        @wraps(old_close)
        def close():
            old_close()
            open_files.remove(file)
        file.close = close
        file.stack = traceback.extract_stack()

        open_files.add(file)
        return file
    return tracking_open


def print_open_files():
    print(f'### {len(open_files)} OPEN FILES: [{", ".join(f.name for f in open_files)}]', file=sys.stderr)
    for file in open_files:
        print(f'Open file {file.name}:\n{"".join(traceback.format_list(file.stack))}', file=sys.stderr)


open_files = set()
io.open = opener(io.open)
builtins.open = opener(builtins.open)

答案 1 :(得分:34)

要以跨平台方式列出所有打开的文件,我建议psutil

#!/usr/bin/env python
import psutil

for proc in psutil.process_iter():
    print proc.open_files()

原始问题隐含地将操作限制为当前正在运行的进程,可以通过psutil的Process类访问该进程。

proc = psutil.Process()
print proc.open_files()

最后,您希望使用具有相应权限的帐户运行代码来访问此信息,否则您可能会看到AccessDenied错误。

答案 2 :(得分:23)

在Linux上,您可以查看/proc/self/fd

的内容
$ ls -l /proc/self/fd/
total 0
lrwx------ 1 foo users 64 Jan  7 15:15 0 -> /dev/pts/3
lrwx------ 1 foo users 64 Jan  7 15:15 1 -> /dev/pts/3
lrwx------ 1 foo users 64 Jan  7 15:15 2 -> /dev/pts/3
lr-x------ 1 foo users 64 Jan  7 15:15 3 -> /proc/9527/fd

答案 3 :(得分:14)

尽管上面打开的解决方案对于自己的代码很有用,但我正在调试我的客户端到第三方库,包括一些c扩展代码,所以我需要一个更直接的方法。以下例程在darwin下工作,并且(我希望)其他类似unix的环境:

def get_open_fds():
    '''
    return the number of open file descriptors for current process

    .. warning: will only work on UNIX-like os-es.
    '''
    import subprocess
    import os

    pid = os.getpid()
    procs = subprocess.check_output( 
        [ "lsof", '-w', '-Ff', "-p", str( pid ) ] )

    nprocs = len( 
        filter( 
            lambda s: s and s[ 0 ] == 'f' and s[1: ].isdigit(),
            procs.split( '\n' ) )
        )
    return nprocs

如果有人可以扩展到可移植到Windows,我将不胜感激。

答案 4 :(得分:9)

在Linux上,您可以使用lsof显示进程打开的所有文件。

答案 5 :(得分:5)

在Windows上,您可以使用Process Explorer显示进程拥有的所有文件句柄。

答案 6 :(得分:3)

接受的响应存在一些限制,因为它似乎不计算管道。我有一个python脚本打开了许多子进程,并且无法正确关闭用于通信的标准输入,输出和错误管道。如果我使用接受的响应,它将无法将这些打开的管道计为打开文件,但(至少在Linux中)它们是打开的文件并计入打开文件限制。 sumid和shunc建议的lsof -p解决方案适用于这种情况,因为它还会显示打开的管道。

答案 7 :(得分:3)

如前所述,您可以在 / proc / self / fd 中列出Linux上的fds,这是一种以编程方式列出它们的简单方法:

import os
import sys
import errno

def list_fds():
    """List process currently open FDs and their target """
    if sys.platform != 'linux2':
        raise NotImplementedError('Unsupported platform: %s' % sys.platform)

    ret = {}
    base = '/proc/self/fd'
    for num in os.listdir(base):
        path = None
        try:
            path = os.readlink(os.path.join(base, num))
        except OSError as err:
            # Last FD is always the "listdir" one (which may be closed)
            if err.errno != errno.ENOENT:
                raise
        ret[int(num)] = path

    return ret

答案 8 :(得分:2)

我猜你是在泄漏文件描述符。您可能希望查看代码以确保关闭所有打开的文件。

答案 9 :(得分:2)

获取所有打开文件的列表。 handle.exe是Microsoft Sysinternals Suite的一部分。另一种选择是psutil Python模块,但我发现'handle'会打印出更多正在使用的文件。

这就是我所做的。 Kludgy代码警告。

#!/bin/python3
# coding: utf-8
"""Build set of files that are in-use by processes.
   Requires 'handle.exe' from Microsoft SysInternals Suite.
   This seems to give a more complete list than using the psutil module.
"""

from collections import OrderedDict
import os
import re
import subprocess

# Path to handle executable
handle = "E:/Installers and ZIPs/Utility/Sysinternalssuite/handle.exe"

# Get output string from 'handle'
handle_str = subprocess.check_output([handle]).decode(encoding='ASCII')

""" Build list of lists.
    1. Split string output, using '-' * 78 as section breaks.
    2. Ignore first section, because it is executable version info.
    3. Turn list of strings into a list of lists, ignoring first item (it's empty).
"""
work_list = [x.splitlines()[1:] for x in handle_str.split(sep='-' * 78)[1:]]

""" Build OrderedDict of pid information.
    pid_dict['pid_num'] = ['pid_name','open_file_1','open_file_2', ...]
"""
pid_dict = OrderedDict()
re1 = re.compile("(.*?\.exe) pid: ([0-9]+)")  # pid name, pid number
re2 = re.compile(".*File.*\s\s\s(.*)")  # File name
for x_list in work_list:
    key = ''
    file_values = []
    m1 = re1.match(x_list[0])
    if m1:
        key = m1.group(2)
#        file_values.append(m1.group(1))  # pid name first item in list

    for y_strings in x_list:
        m2 = re2.match(y_strings)
        if m2:
            file_values.append(m2.group(1))
    pid_dict[key] = file_values

# Make a set of all the open files
values = []
for v in pid_dict.values():
    values.extend(v)
files_open = sorted(set(values))

txt_file = os.path.join(os.getenv('TEMP'), 'lsof_handle_files')

with open(txt_file, 'w') as fd:
    for a in sorted(files_open):
        fd.write(a + '\n')
subprocess.call(['notepad', txt_file])
os.remove(txt_file)

答案 10 :(得分:1)

您可以使用以下脚本。它建立在Claudiu的answer之上。它解决了一些问题并增加了其他功能:

  • 打印文件打开位置的堆栈跟踪
  • 打印程序退出
  • 关键字参数支持

以下是代码和指向gist的链接,该链接可能更新。

"""
Collect stacktraces of where files are opened, and prints them out before the
program exits.

Example
========

monitor.py
----------
from filemonitor import FileMonitor
FileMonitor().patch()
f = open('/bin/ls')
# end of monitor.py

$ python monitor.py
  ----------------------------------------------------------------------------
  path = /bin/ls
  >   File "monitor.py", line 3, in <module>
  >     f = open('/bin/ls')
  ----------------------------------------------------------------------------

Solution modified from:
https://stackoverflow.com/questions/2023608/check-what-files-are-open-in-python
"""
from __future__ import print_function
import __builtin__
import traceback
import atexit
import textwrap


class FileMonitor(object):

    def __init__(self, print_only_open=True):
        self.openfiles = []
        self.oldfile = __builtin__.file
        self.oldopen = __builtin__.open

        self.do_print_only_open = print_only_open
        self.in_use = False

        class File(self.oldfile):

            def __init__(this, *args, **kwargs):
                path = args[0]

                self.oldfile.__init__(this, *args, **kwargs)
                if self.in_use:
                    return
                self.in_use = True
                self.openfiles.append((this, path, this._stack_trace()))
                self.in_use = False

            def close(this):
                self.oldfile.close(this)

            def _stack_trace(this):
                try:
                    raise RuntimeError()
                except RuntimeError as e:
                    stack = traceback.extract_stack()[:-2]
                    return traceback.format_list(stack)

        self.File = File

    def patch(self):
        __builtin__.file = self.File
        __builtin__.open = self.File

        atexit.register(self.exit_handler)

    def unpatch(self):
        __builtin__.file = self.oldfile
        __builtin__.open = self.oldopen

    def exit_handler(self):
        indent = '  > '
        terminal_width = 80
        for file, path, trace in self.openfiles:
            if file.closed and self.do_print_only_open:
                continue
            print("-" * terminal_width)
            print("  {} = {}".format('path', path))
            lines = ''.join(trace).splitlines()
            _updated_lines = []
            for l in lines:
                ul = textwrap.fill(l,
                                   initial_indent=indent,
                                   subsequent_indent=indent,
                                   width=terminal_width)
                _updated_lines.append(ul)
            lines = _updated_lines
            print('\n'.join(lines))
            print("-" * terminal_width)
            print()