Question

我正在尝试从内存like it is suggested in here中删除密码字符串。

我写了那个小片段：

import ctypes, sys

def zerome(string):
    location = id(string) + 20
    size     = sys.getsizeof(string) - 20
    #memset =  ctypes.cdll.msvcrt.memset
    # For Linux, use the following. Change the 6 to whatever it is on your computer.
    print ctypes.string_at(location, size)
    memset =  ctypes.CDLL("libc.so.6").memset
    memset(location, 0, size)
    print "Clearing 0x%08x size %i bytes" % (location, size)
    print ctypes.string_at(location, size)

a = "asdasd"

zerome(a)

奇怪的是，这段代码适用于IPython，

[7] oz123@yenitiny:~ $ ipython a.py 
Clearing 0x02275b84 size 23 bytes

但崩溃了Python：

[8] oz123@yenitiny:~ $ python a.py 
Segmentation fault
[9] oz123@yenitiny:~ $

任何想法为什么？

我使用Python 2.7.3在Debian Wheezy上测试过。

小更新...

该代码适用于使用Python 2.6.6的CentOS 6.2。代码在Debian上用Python 2.6.8崩溃了。我试过想为什么它在CentOS上运行，而不是在Debian上运行。唯一的理由，不一样的是，我的Debian是multiarch和CentOS 使用i686 CPU在我的旧笔记本电脑上运行。

因此，我重新启动了我的CentOS latop并加载了Debian Wheezy。该代码适用于Debian Wheezy，它不是多拱的。因此，我怀疑我在Debian上的配置有些问题......

Answer 1

ctypes已经有memset函数，因此您不必为libc / msvcrt函数创建函数指针。此外，20个字节用于常见的32位平台。在64位系统上，它可能是36个字节。这是PyStringObject：

的布局

typedef struct {
    Py_ssize_t ob_refcnt;         // 4|8 bytes
    struct _typeobject *ob_type;  // 4|8 bytes
    Py_ssize_t ob_size;           // 4|8 bytes
    long ob_shash;                // 4|8 bytes (4 on 64-bit Windows)
    int ob_sstate;                // 4 bytes
    char ob_sval[1];
} PyStringObject;

因此，在32位系统上可能是5 * 4 = 20字节，在64位Linux上可能是8 * 4 + 4 = 36字节，或者在64位Windows上是8 * 3 + 4 * 2 = 32字节。由于未使用垃圾收集标头跟踪字符串，因此您可以使用sys.getsizeof。通常，如果您不希望包含GC头大小（在内存中它实际上在您从id获得的对象的基址之前），则使用对象的__sizeof__方法。至少这是我经验中的一般规律。

您想要的是简单地从对象大小中减去缓冲区大小。 CPython中的字符串以空值终止，因此只需在其长度上加1即可获得缓冲区大小。例如：

>>> a = 'abcdef'
>>> bufsize = len(a) + 1
>>> offset = sys.getsizeof(a) - bufsize
>>> ctypes.memset(id(a) + offset, 0, bufsize)
3074822964L
>>> a
'\x00\x00\x00\x00\x00\x00'

修改

更好的选择是定义PyStringObject结构。这样便于检查ob_sstate。如果它大于0，那意味着字符串被实现，而理智的事情是引发异常。单字符字符串以及仅由ASCII字母和下划线组成的代码对象中的字符串常量，以及解释器内部用于名称（变量名称，属性）的字符串。

from ctypes import * class PyStringObject(Structure): _fields_ = [ ('ob_refcnt', c_ssize_t), ('ob_type', py_object), ('ob_size', c_ssize_t), ('ob_shash', c_long), ('ob_sstate', c_int), # ob_sval varies in size # zero with memset is simpler ] def zerostr(s): """zero a non-interned string""" if not isinstance(s, str): raise TypeError( "expected str object, not %s" % type(s).__name__) s_obj = PyStringObject.from_address(id(s)) if s_obj.ob_sstate > 0: raise RuntimeError("cannot zero interned string") s_obj.ob_shash = -1 # not hashed yet offset = sizeof(PyStringObject) memset(id(s) + offset, 0, len(s))

例如：

>>> s = 'abcd' # interned by code object >>> zerostr(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 10, in zerostr RuntimeError: cannot zero interned string >>> s = raw_input() # not interned abcd >>> zerostr(s) >>> s '\x00\x00\x00\x00'

python中的ctypes与memset崩溃

小更新...

1 个答案: