Question

我一直在尝试为Xcode中的自定义字符串类型创建自定义数据格式化程序。以下代码获取字符串中第一个字符的地址：

def MyStringSummary(valobj, internal_dict):
    data_pointer = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('Data')
    print data_pointer.GetValue()

打印出指针地址。当我查看该地址的内容时，我可以看到用于存储该数据的宽字符，所以我想我要做的是将此指针转换为wchar_t，然后我得到了第一个字符。我的第一个方法之一是：

if data_pointer.TypeIsPointerType():
    mychar = data_pointer.Dereference()
    print mychar.GetValue()
else:
    print "data_pointer is not a pointer!"

这确认data_pointer 是指针，但Dereference()调用似乎无法解决任何问题：mychar.GetValue()只返回None。另一个问题 - 我是否能够通过一个循环并每次将data_pointer的地址增加一个固定的数量并保持解除引用并找到下一个字符，然后将其添加到输出字符串中？如果是这样，我该怎么做？

编辑：

为了帮助澄清问题，我将发布一些有关字符串的基础数据结构的信息。这个定义太长了，不能在这里发布（它也继承了它从泛型数组基类中所做的大部分工作），但我会给出更多细节。

当查看StringVar.AllocationInstance.Data指针位置时，我可以看到每个字符使用16位。我正在查看的字符串中的所有字符仅为8位，每个字符后另有8位为0。所以，当我在调试器中执行此操作时会发生这种情况：

(lldb) p (char*)(StringVar.AllocatorInstance.Data)
(char *) $4 = 0x10653360 "P"
(lldb) p (char*)(StringVar.AllocatorInstance.Data)+1
(char *) $6 = 0x10653361 ""
(lldb) p (char*)(StringVar.AllocatorInstance.Data)+2
(char *) $7 = 0x10653362 "a"

所以我假设它一次只显示一个字符的原因是因为它认为每个8位字符由以下8位空终止。但是，当我转向unsigned short时，我得到了这个：

(lldb) p (unsigned short*)(StringVar.AllocatorInstance.Data)
(unsigned short *) $9 = 0x10653360
(lldb) p *(unsigned short*)(StringVar.AllocatorInstance.Data)
(wchar_t) $10 = 80
(lldb) p (char*)(unsigned short*)(StringVar.AllocatorInstance.Data)
(char *) $11 = 0x10653360 "P"
(lldb) p (char*)((unsigned short*)(StringVar.AllocatorInstance.Data)+1)
(char *) $14 = 0x10653362 "a"
(lldb) p (char*)((unsigned short*)(StringVar.AllocatorInstance.Data)+2)
(char *) $18 = 0x10653364 "r"

...所以看起来像unsigned short的强制转换就好了，只要我们将每个整数转换为char。知道如何尝试将它放在Python数据格式化器中吗？

Answer 1

您的Data看起来可能是UTF-16。我做了一个快速的C程序，看起来有点像你的问题描述，并在交互式Python解释器中玩了一下。我认为这可能足以让您指出编写自己的格式化程序的正确方向吗？

int main ()
{
    struct String *mystr = AllocateString();
    mystr->AllocatorInstance.len = 10;
    mystr->AllocatorInstance.Data = (void *) malloc (10);
    memset (mystr->AllocatorInstance.Data, 0, 10);
    ((char *)mystr->AllocatorInstance.Data)[0] = 'h';
    ((char *)mystr->AllocatorInstance.Data)[2] = 'e';
    ((char *)mystr->AllocatorInstance.Data)[4] = 'l';
    ((char *)mystr->AllocatorInstance.Data)[6] = 'l';
    ((char *)mystr->AllocatorInstance.Data)[8] = 'o';

    FreeString (mystr);
}

使用lldb.frame，lldb.process快捷方式（仅在进行交互式script时有效），我们可以轻松地将Data读入python字符串缓冲区：

>>> valobj = lldb.frame.FindVariable("mystr")
>>> address = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('Data').GetValueAsUnsigned()
>>> size = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('len').GetValueAsUnsigned()
>>> print address
4296016096
>>> print size
10
>>> err = lldb.SBError()
>>> print err
error: <NULL>
>>> membuf = lldb.process.ReadMemory (address, size, err)
>>> print err
success
>>> membuf
'h\x00e\x00l\x00l\x00o\x00'

从这一点开始，你可以做任何常见的python数组类型的东西 -

>>> for b in membuf:
...   print ord(b)
... 
104
0
101
0
108
0
108
0
111
0

我不确定你怎么能告诉Python这是UTF-16并且应该正确地内化为宽字符，这比lldb问题更像是一个Python问题 - 但我认为你最好的选择是不使用SBValue方法（因为您的Data指针具有像void *这样的无信息类型，就像我在测试程序中所做的那样），但使用SBProcess内存读取方法。< / p>

Answer 2

没有任何源代码引用，这个问题比应该更难解决。

话虽如此，我的第一个赌注是你的Char *类型是一个“不透明”的引用，所以当你去取消引用它时，LLDB对指针类型一无所知，也无法解决它。或者也许指针类型不是基本类型（int，char，float，...）并且因此没有值（值本质上是标量属性，结构或类或联合没有值，它们具有成员）

您可以发布字符串类型的定义吗？

从那里开始，有几种方法可以从内存位置提取大量数据。你的字符串是ASCII / UTF8编码的吗？如果是这样，您可以使用Process.ReadCStringFromMemory为其指定值。这将读取，直到找到第一个0终结符，或者直到达到某个最大长度（你希望这样做以避免从乱码内存中读取无限量的数据）

如果不是这样，还有其他方法。

同样，您可以提供的有关数据结构内部的信息越多，就越容易为其编写格式化程序。

LLDB Python脚本中的指针算法

2 个答案: